bzip2-1.0.2
This commit is contained in:
parent
795b859eee
commit
099d844292
86
CHANGES
86
CHANGES
@ -165,3 +165,89 @@ There are no functionality changes or bug fixes relative to version
|
||||
1.0.0. This is just a documentation update + a fix for minor Win32
|
||||
build problems. For almost everyone, upgrading from 1.0.0 to 1.0.1 is
|
||||
utterly pointless. Don't bother.
|
||||
|
||||
|
||||
1.0.2
|
||||
~~~~~
|
||||
A bug fix release, addressing various minor issues which have appeared
|
||||
in the 18 or so months since 1.0.1 was released. Most of the fixes
|
||||
are to do with file-handling or documentation bugs. To the best of my
|
||||
knowledge, there have been no data-loss-causing bugs reported in the
|
||||
compression/decompression engine of 1.0.0 or 1.0.1.
|
||||
|
||||
Note that this release does not improve the rather crude build system
|
||||
for Unix platforms. The general plan here is to autoconfiscate/
|
||||
libtoolise 1.0.2 soon after release, and release the result as 1.1.0
|
||||
or perhaps 1.2.0. That, however, is still just a plan at this point.
|
||||
|
||||
Here are the changes in 1.0.2. Bug-reporters and/or patch-senders in
|
||||
parentheses.
|
||||
|
||||
* Fix an infinite segfault loop in 1.0.1 when a directory is
|
||||
encountered in -f (force) mode.
|
||||
(Trond Eivind Glomsrod, Nicholas Nethercote, Volker Schmidt)
|
||||
|
||||
* Avoid double fclose() of output file on certain I/O error paths.
|
||||
(Solar Designer)
|
||||
|
||||
* Don't fail with internal error 1007 when fed a long stream (> 48MB)
|
||||
of byte 251. Also print useful message suggesting that 1007s may be
|
||||
caused by bad memory.
|
||||
(noticed by Juan Pedro Vallejo, fixed by me)
|
||||
|
||||
* Fix uninitialised variable silly bug in demo prog dlltest.c.
|
||||
(Jorj Bauer)
|
||||
|
||||
* Remove 512-MB limitation on recovered file size for bzip2recover
|
||||
on selected platforms which support 64-bit ints. At the moment
|
||||
all GCC supported platforms, and Win32.
|
||||
(me, Alson van der Meulen)
|
||||
|
||||
* Hard-code header byte values, to give correct operation on platforms
|
||||
using EBCDIC as their native character set (IBM's OS/390).
|
||||
(Leland Lucius)
|
||||
|
||||
* Copy file access times correctly.
|
||||
(Marty Leisner)
|
||||
|
||||
* Add distclean and check targets to Makefile.
|
||||
(Michael Carmack)
|
||||
|
||||
* Parameterise use of ar and ranlib in Makefile. Also add $(LDFLAGS).
|
||||
(Rich Ireland, Bo Thorsen)
|
||||
|
||||
* Pass -p (create parent dirs as needed) to mkdir during make install.
|
||||
(Jeremy Fusco)
|
||||
|
||||
* Dereference symlinks when copying file permissions in -f mode.
|
||||
(Volker Schmidt)
|
||||
|
||||
* Majorly simplify implementation of uInt64_qrm10.
|
||||
(Bo Lindbergh)
|
||||
|
||||
* Check the input file still exists before deleting the output one,
|
||||
when aborting in cleanUpAndFail().
|
||||
(Joerg Prante, Robert Linden, Matthias Krings)
|
||||
|
||||
Also a bunch of patches courtesy of Philippe Troin, the Debian maintainer
|
||||
of bzip2:
|
||||
|
||||
* Wrapper scripts (with manpages): bzdiff, bzgrep, bzmore.
|
||||
|
||||
* Spelling changes and minor enhancements in bzip2.1.
|
||||
|
||||
* Avoid race condition between creating the output file and setting its
|
||||
interim permissions safely, by using fopen_output_safely().
|
||||
No changes to bzip2recover since there is no issue with file
|
||||
permissions there.
|
||||
|
||||
* do not print senseless report with -v when compressing an empty
|
||||
file.
|
||||
|
||||
* bzcat -f works on non-bzip2 files.
|
||||
|
||||
* do not try to escape shell meta-characters on unix (the shell takes
|
||||
care of these).
|
||||
|
||||
* added --fast and --best aliases for -1 -9 for gzip compatibility.
|
||||
|
||||
|
4
LICENSE
4
LICENSE
@ -1,6 +1,6 @@
|
||||
|
||||
This program, "bzip2" and associated library "libbzip2", are
|
||||
copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -35,5 +35,5 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
Julian Seward, Cambridge, UK.
|
||||
jseward@acm.org
|
||||
bzip2/libbzip2 version 1.0 of 21 March 2000
|
||||
bzip2/libbzip2 version 1.0.2 of 30 December 2001
|
||||
|
||||
|
81
Makefile
81
Makefile
@ -1,9 +1,20 @@
|
||||
|
||||
SHELL=/bin/sh
|
||||
|
||||
# To assist in cross-compiling
|
||||
CC=gcc
|
||||
AR=ar
|
||||
RANLIB=ranlib
|
||||
LDFLAGS=
|
||||
|
||||
# Suitably paranoid flags to avoid bugs in gcc-2.7
|
||||
BIGFILES=-D_FILE_OFFSET_BITS=64
|
||||
CFLAGS=-Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce $(BIGFILES)
|
||||
|
||||
# Where you want it installed when you do 'make install'
|
||||
PREFIX=/usr
|
||||
|
||||
|
||||
OBJS= blocksort.o \
|
||||
huffman.o \
|
||||
crctable.o \
|
||||
@ -15,20 +26,21 @@ OBJS= blocksort.o \
|
||||
all: libbz2.a bzip2 bzip2recover test
|
||||
|
||||
bzip2: libbz2.a bzip2.o
|
||||
$(CC) $(CFLAGS) -o bzip2 bzip2.o -L. -lbz2
|
||||
$(CC) $(CFLAGS) $(LDFLAGS) -o bzip2 bzip2.o -L. -lbz2
|
||||
|
||||
bzip2recover: bzip2recover.o
|
||||
$(CC) $(CFLAGS) -o bzip2recover bzip2recover.o
|
||||
$(CC) $(CFLAGS) $(LDFLAGS) -o bzip2recover bzip2recover.o
|
||||
|
||||
libbz2.a: $(OBJS)
|
||||
rm -f libbz2.a
|
||||
ar cq libbz2.a $(OBJS)
|
||||
@if ( test -f /usr/bin/ranlib -o -f /bin/ranlib -o \
|
||||
-f /usr/ccs/bin/ranlib ) ; then \
|
||||
echo ranlib libbz2.a ; \
|
||||
ranlib libbz2.a ; \
|
||||
$(AR) cq libbz2.a $(OBJS)
|
||||
@if ( test -f $(RANLIB) -o -f /usr/bin/ranlib -o \
|
||||
-f /bin/ranlib -o -f /usr/ccs/bin/ranlib ) ; then \
|
||||
echo $(RANLIB) libbz2.a ; \
|
||||
$(RANLIB) libbz2.a ; \
|
||||
fi
|
||||
|
||||
check: test
|
||||
test: bzip2
|
||||
@cat words1
|
||||
./bzip2 -1 < sample1.ref > sample1.rb2
|
||||
@ -45,14 +57,12 @@ test: bzip2
|
||||
cmp sample3.tst sample3.ref
|
||||
@cat words3
|
||||
|
||||
PREFIX=/usr
|
||||
|
||||
install: bzip2 bzip2recover
|
||||
if ( test ! -d $(PREFIX)/bin ) ; then mkdir $(PREFIX)/bin ; fi
|
||||
if ( test ! -d $(PREFIX)/lib ) ; then mkdir $(PREFIX)/lib ; fi
|
||||
if ( test ! -d $(PREFIX)/man ) ; then mkdir $(PREFIX)/man ; fi
|
||||
if ( test ! -d $(PREFIX)/man/man1 ) ; then mkdir $(PREFIX)/man/man1 ; fi
|
||||
if ( test ! -d $(PREFIX)/include ) ; then mkdir $(PREFIX)/include ; fi
|
||||
if ( test ! -d $(PREFIX)/bin ) ; then mkdir -p $(PREFIX)/bin ; fi
|
||||
if ( test ! -d $(PREFIX)/lib ) ; then mkdir -p $(PREFIX)/lib ; fi
|
||||
if ( test ! -d $(PREFIX)/man ) ; then mkdir -p $(PREFIX)/man ; fi
|
||||
if ( test ! -d $(PREFIX)/man/man1 ) ; then mkdir -p $(PREFIX)/man/man1 ; fi
|
||||
if ( test ! -d $(PREFIX)/include ) ; then mkdir -p $(PREFIX)/include ; fi
|
||||
cp -f bzip2 $(PREFIX)/bin/bzip2
|
||||
cp -f bzip2 $(PREFIX)/bin/bunzip2
|
||||
cp -f bzip2 $(PREFIX)/bin/bzcat
|
||||
@ -67,7 +77,26 @@ install: bzip2 bzip2recover
|
||||
chmod a+r $(PREFIX)/include/bzlib.h
|
||||
cp -f libbz2.a $(PREFIX)/lib
|
||||
chmod a+r $(PREFIX)/lib/libbz2.a
|
||||
cp -f bzgrep $(PREFIX)/bin/bzgrep
|
||||
ln $(PREFIX)/bin/bzgrep $(PREFIX)/bin/bzegrep
|
||||
ln $(PREFIX)/bin/bzgrep $(PREFIX)/bin/bzfgrep
|
||||
chmod a+x $(PREFIX)/bin/bzgrep
|
||||
cp -f bzmore $(PREFIX)/bin/bzmore
|
||||
ln $(PREFIX)/bin/bzmore $(PREFIX)/bin/bzless
|
||||
chmod a+x $(PREFIX)/bin/bzmore
|
||||
cp -f bzdiff $(PREFIX)/bin/bzdiff
|
||||
ln $(PREFIX)/bin/bzdiff $(PREFIX)/bin/bzcmp
|
||||
chmod a+x $(PREFIX)/bin/bzdiff
|
||||
cp -f bzgrep.1 bzmore.1 bzdiff.1 $(PREFIX)/man/man1
|
||||
chmod a+r $(PREFIX)/man/man1/bzgrep.1
|
||||
chmod a+r $(PREFIX)/man/man1/bzmore.1
|
||||
chmod a+r $(PREFIX)/man/man1/bzdiff.1
|
||||
echo ".so man1/bzgrep.1" > $(PREFIX)/man/man1/bzegrep.1
|
||||
echo ".so man1/bzgrep.1" > $(PREFIX)/man/man1/bzfgrep.1
|
||||
echo ".so man1/bzmore.1" > $(PREFIX)/man/man1/bzless.1
|
||||
echo ".so man1/bzdiff.1" > $(PREFIX)/man/man1/bzcmp.1
|
||||
|
||||
distclean: clean
|
||||
clean:
|
||||
rm -f *.o libbz2.a bzip2 bzip2recover \
|
||||
sample1.rb2 sample2.rb2 sample3.rb2 \
|
||||
@ -93,7 +122,7 @@ bzip2.o: bzip2.c
|
||||
bzip2recover.o: bzip2recover.c
|
||||
$(CC) $(CFLAGS) -c bzip2recover.c
|
||||
|
||||
DISTNAME=bzip2-1.0.1
|
||||
DISTNAME=bzip2-1.0.2
|
||||
tarfile:
|
||||
rm -f $(DISTNAME)
|
||||
ln -sf . $(DISTNAME)
|
||||
@ -112,6 +141,7 @@ tarfile:
|
||||
$(DISTNAME)/Makefile \
|
||||
$(DISTNAME)/manual.texi \
|
||||
$(DISTNAME)/manual.ps \
|
||||
$(DISTNAME)/manual.pdf \
|
||||
$(DISTNAME)/LICENSE \
|
||||
$(DISTNAME)/bzip2.1 \
|
||||
$(DISTNAME)/bzip2.1.preformatted \
|
||||
@ -138,4 +168,25 @@ tarfile:
|
||||
$(DISTNAME)/Y2K_INFO \
|
||||
$(DISTNAME)/unzcrash.c \
|
||||
$(DISTNAME)/spewG.c \
|
||||
$(DISTNAME)/mk251.c \
|
||||
$(DISTNAME)/bzdiff \
|
||||
$(DISTNAME)/bzdiff.1 \
|
||||
$(DISTNAME)/bzmore \
|
||||
$(DISTNAME)/bzmore.1 \
|
||||
$(DISTNAME)/bzgrep \
|
||||
$(DISTNAME)/bzgrep.1 \
|
||||
$(DISTNAME)/Makefile-libbz2_so
|
||||
gzip -v $(DISTNAME).tar
|
||||
|
||||
# For rebuilding the manual from sources on my RedHat 7.2 box
|
||||
manual: manual.ps manual.pdf manual.html
|
||||
|
||||
manual.ps: manual.texi
|
||||
tex manual.texi
|
||||
dvips -o manual.ps manual.dvi
|
||||
|
||||
manual.pdf: manual.ps
|
||||
ps2pdf manual.ps
|
||||
|
||||
manual.html: manual.texi
|
||||
texi2html -split_chapter manual.texi
|
||||
|
@ -1,8 +1,9 @@
|
||||
|
||||
# This Makefile builds a shared version of the library,
|
||||
# libbz2.so.1.0.1, with soname libbz2.so.1.0,
|
||||
# at least on x86-Linux (RedHat 5.2),
|
||||
# with gcc-2.7.2.3. Please see the README file for some
|
||||
# libbz2.so.1.0.2, with soname libbz2.so.1.0,
|
||||
# at least on x86-Linux (RedHat 7.2),
|
||||
# with gcc-2.96 20000731 (Red Hat Linux 7.1 2.96-98).
|
||||
# Please see the README file for some
|
||||
# important info about building the library like this.
|
||||
|
||||
SHELL=/bin/sh
|
||||
@ -19,13 +20,13 @@ OBJS= blocksort.o \
|
||||
bzlib.o
|
||||
|
||||
all: $(OBJS)
|
||||
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
|
||||
$(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.1
|
||||
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.2 $(OBJS)
|
||||
$(CC) $(CFLAGS) -o bzip2-shared bzip2.c libbz2.so.1.0.2
|
||||
rm -f libbz2.so.1.0
|
||||
ln -s libbz2.so.1.0.1 libbz2.so.1.0
|
||||
ln -s libbz2.so.1.0.2 libbz2.so.1.0
|
||||
|
||||
clean:
|
||||
rm -f $(OBJS) bzip2.o libbz2.so.1.0.1 libbz2.so.1.0 bzip2-shared
|
||||
rm -f $(OBJS) bzip2.o libbz2.so.1.0.2 libbz2.so.1.0 bzip2-shared
|
||||
|
||||
blocksort.o: blocksort.c
|
||||
$(CC) $(CFLAGS) -c blocksort.c
|
||||
|
73
README
73
README
@ -1,15 +1,15 @@
|
||||
|
||||
This is the README for bzip2, a block-sorting file compressor, version
|
||||
1.0. This version is fully compatible with the previous public
|
||||
releases, bzip2-0.1pl2, bzip2-0.9.0 and bzip2-0.9.5.
|
||||
1.0.2. This version is fully compatible with the previous public
|
||||
releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1.
|
||||
|
||||
bzip2-1.0 is distributed under a BSD-style license. For details,
|
||||
bzip2-1.0.2 is distributed under a BSD-style license. For details,
|
||||
see the file LICENSE.
|
||||
|
||||
Complete documentation is available in Postscript form (manual.ps) or
|
||||
html (manual_toc.html). A plain-text version of the manual page is
|
||||
available as bzip2.txt. A statement about Y2K issues is now included
|
||||
in the file Y2K_INFO.
|
||||
Complete documentation is available in Postscript form (manual.ps),
|
||||
PDF (manual.pdf, amazingly enough) or html (manual_toc.html). A
|
||||
plain-text version of the manual page is available as bzip2.txt.
|
||||
A statement about Y2K issues is now included in the file Y2K_INFO.
|
||||
|
||||
|
||||
HOW TO BUILD -- UNIX
|
||||
@ -33,34 +33,41 @@ not actually execute them.
|
||||
HOW TO BUILD -- UNIX, shared library libbz2.so.
|
||||
|
||||
Do 'make -f Makefile-libbz2_so'. This Makefile seems to work for
|
||||
Linux-ELF (RedHat 5.2 on an x86 box), with gcc. I make no claims
|
||||
Linux-ELF (RedHat 7.2 on an x86 box), with gcc. I make no claims
|
||||
that it works for any other platform, though I suspect it probably
|
||||
will work for most platforms employing both ELF and gcc.
|
||||
|
||||
bzip2-shared, a client of the shared library, is also build, but
|
||||
not self-tested. So I suggest you also build using the normal
|
||||
Makefile, since that conducts a self-test.
|
||||
bzip2-shared, a client of the shared library, is also built, but not
|
||||
self-tested. So I suggest you also build using the normal Makefile,
|
||||
since that conducts a self-test. A second reason to prefer the
|
||||
version statically linked to the library is that, on x86 platforms,
|
||||
building shared objects makes a valuable register (%ebx) unavailable
|
||||
to gcc, resulting in a slowdown of 10%-20%, at least for bzip2.
|
||||
|
||||
Important note for people upgrading .so's from 0.9.0/0.9.5 to
|
||||
version 1.0. All the functions in the library have been renamed,
|
||||
from (eg) bzCompress to BZ2_bzCompress, to avoid namespace pollution.
|
||||
Important note for people upgrading .so's from 0.9.0/0.9.5 to version
|
||||
1.0.X. All the functions in the library have been renamed, from (eg)
|
||||
bzCompress to BZ2_bzCompress, to avoid namespace pollution.
|
||||
Unfortunately this means that the libbz2.so created by
|
||||
Makefile-libbz2_so will not work with any program which used an
|
||||
older version of the library. Sorry. I do encourage library
|
||||
clients to make the effort to upgrade to use version 1.0, since
|
||||
it is both faster and more robust than previous versions.
|
||||
Makefile-libbz2_so will not work with any program which used an older
|
||||
version of the library. Sorry. I do encourage library clients to
|
||||
make the effort to upgrade to use version 1.0, since it is both faster
|
||||
and more robust than previous versions.
|
||||
|
||||
|
||||
HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc.
|
||||
|
||||
It's difficult for me to support compilation on all these platforms.
|
||||
My approach is to collect binaries for these platforms, and put them
|
||||
on the master web page (http://sourceware.cygnus.com/bzip2). Look
|
||||
there. However (FWIW), bzip2-1.0 is very standard ANSI C and should
|
||||
compile unmodified with MS Visual C. For Win32, there is one
|
||||
important caveat: in bzip2.c, you must set BZ_UNIX to 0 and
|
||||
BZ_LCCWIN32 to 1 before building. If you have difficulties building,
|
||||
you might want to read README.COMPILATION.PROBLEMS.
|
||||
on the master web page (http://sources.redhat.com/bzip2). Look there.
|
||||
However (FWIW), bzip2-1.0.X is very standard ANSI C and should compile
|
||||
unmodified with MS Visual C. If you have difficulties building, you
|
||||
might want to read README.COMPILATION.PROBLEMS.
|
||||
|
||||
At least using MS Visual C++ 6, you can build from the unmodified
|
||||
sources by issuing, in a command shell:
|
||||
nmake -f makefile.msc
|
||||
(you may need to first run the MSVC-provided script VCVARS32.BAT
|
||||
so as to set up paths to the MSVC tools correctly).
|
||||
|
||||
|
||||
VALIDATION
|
||||
@ -138,24 +145,31 @@ WHAT'S NEW IN 0.9.5 ?
|
||||
* Many small improvements in file and flag handling.
|
||||
* A Y2K statement.
|
||||
|
||||
WHAT'S NEW IN 1.0
|
||||
WHAT'S NEW IN 1.0.0 ?
|
||||
|
||||
See the CHANGES file.
|
||||
|
||||
WHAT'S NEW IN 1.0.2 ?
|
||||
|
||||
See the CHANGES file.
|
||||
|
||||
|
||||
I hope you find bzip2 useful. Feel free to contact me at
|
||||
jseward@acm.org
|
||||
if you have any suggestions or queries. Many people mailed me with
|
||||
comments, suggestions and patches after the releases of bzip-0.15,
|
||||
bzip-0.21, bzip2-0.1pl2 and bzip2-0.9.0, and the changes in bzip2 are
|
||||
largely a result of this feedback. I thank you for your comments.
|
||||
bzip-0.21, and bzip2 versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
|
||||
and the changes in bzip2 are largely a result of this feedback.
|
||||
I thank you for your comments.
|
||||
|
||||
At least for the time being, bzip2's "home" is (or can be reached via)
|
||||
http://www.muraroa.demon.co.uk.
|
||||
http://sources.redhat.com/bzip2.
|
||||
|
||||
Julian Seward
|
||||
jseward@acm.org
|
||||
|
||||
Cambridge, UK
|
||||
Cambridge, UK (and what a great town this is!)
|
||||
|
||||
18 July 1996 (version 0.15)
|
||||
25 August 1996 (version 0.21)
|
||||
7 August 1997 (bzip2, version 0.1)
|
||||
@ -164,3 +178,4 @@ Cambridge, UK
|
||||
8 June 1999 (bzip2, version 0.9.5)
|
||||
4 Sept 1999 (bzip2, version 0.9.5d)
|
||||
5 May 2000 (bzip2, version 1.0pre8)
|
||||
30 December 2001 (bzip2, version 1.0.2pre1)
|
@ -117,11 +117,11 @@ Known problems as of 1.0pre8:
|
||||
All that said: you might be able to get somewhere
|
||||
by finding the line in Makefile-libbz2_so which says
|
||||
|
||||
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.1 $(OBJS)
|
||||
$(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.2 $(OBJS)
|
||||
|
||||
and replacing with
|
||||
|
||||
($CC) -G -shared -o libbz2.so.1.0.1 -h libbz2.so.1.0 $(OBJS)
|
||||
$(CC) -G -shared -o libbz2.so.1.0.2 -h libbz2.so.1.0 $(OBJS)
|
||||
|
||||
If gcc objects to the combination -fpic -fPIC, get rid of
|
||||
the second one, leaving just "-fpic".
|
||||
|
11
blocksort.c
11
blocksort.c
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -981,7 +981,14 @@ void mainSort ( UInt32* ptr,
|
||||
}
|
||||
}
|
||||
|
||||
AssertH ( copyStart[ss]-1 == copyEnd[ss], 1007 );
|
||||
AssertH ( (copyStart[ss]-1 == copyEnd[ss])
|
||||
||
|
||||
/* Extremely rare case missing in bzip2-1.0.0 and 1.0.1.
|
||||
Necessity for this case is demonstrated by compressing
|
||||
a sequence of approximately 48.5 million of character
|
||||
251; 1.0.0/1.0.1 will then die here. */
|
||||
(copyStart[ss] == 0 && copyEnd[ss] == nblock-1),
|
||||
1007 )
|
||||
|
||||
for (j = 0; j <= 255; j++) ftab[(j << 8) + ss] |= SETMASK;
|
||||
|
||||
|
76
bzdiff
Normal file
76
bzdiff
Normal file
@ -0,0 +1,76 @@
|
||||
#!/bin/sh
|
||||
# sh is buggy on RS/6000 AIX 3.2. Replace above line with #!/bin/ksh
|
||||
|
||||
# Bzcmp/diff wrapped for bzip2,
|
||||
# adapted from zdiff by Philippe Troin <phil@fifi.org> for Debian GNU/Linux.
|
||||
|
||||
# Bzcmp and bzdiff are used to invoke the cmp or the diff pro-
|
||||
# gram on compressed files. All options specified are passed
|
||||
# directly to cmp or diff. If only 1 file is specified, then
|
||||
# the files compared are file1 and an uncompressed file1.gz.
|
||||
# If two files are specified, then they are uncompressed (if
|
||||
# necessary) and fed to cmp or diff. The exit status from cmp
|
||||
# or diff is preserved.
|
||||
|
||||
PATH="/usr/bin:$PATH"; export PATH
|
||||
prog=`echo $0 | sed 's|.*/||'`
|
||||
case "$prog" in
|
||||
*cmp) comp=${CMP-cmp} ;;
|
||||
*) comp=${DIFF-diff} ;;
|
||||
esac
|
||||
|
||||
OPTIONS=
|
||||
FILES=
|
||||
for ARG
|
||||
do
|
||||
case "$ARG" in
|
||||
-*) OPTIONS="$OPTIONS $ARG";;
|
||||
*) if test -f "$ARG"; then
|
||||
FILES="$FILES $ARG"
|
||||
else
|
||||
echo "${prog}: $ARG not found or not a regular file"
|
||||
exit 1
|
||||
fi ;;
|
||||
esac
|
||||
done
|
||||
if test -z "$FILES"; then
|
||||
echo "Usage: $prog [${comp}_options] file [file]"
|
||||
exit 1
|
||||
fi
|
||||
tmp=`tempfile -d /tmp -p bz` || {
|
||||
echo 'cannot create a temporary file' >&2
|
||||
exit 1
|
||||
}
|
||||
set $FILES
|
||||
if test $# -eq 1; then
|
||||
FILE=`echo "$1" | sed 's/.bz2$//'`
|
||||
bzip2 -cd "$FILE.bz2" | $comp $OPTIONS - "$FILE"
|
||||
STAT="$?"
|
||||
|
||||
elif test $# -eq 2; then
|
||||
case "$1" in
|
||||
*.bz2)
|
||||
case "$2" in
|
||||
*.bz2)
|
||||
F=`echo "$2" | sed 's|.*/||;s|.bz2$||'`
|
||||
bzip2 -cdfq "$2" > $tmp
|
||||
bzip2 -cdfq "$1" | $comp $OPTIONS - $tmp
|
||||
STAT="$?"
|
||||
/bin/rm -f $tmp;;
|
||||
|
||||
*) bzip2 -cdfq "$1" | $comp $OPTIONS - "$2"
|
||||
STAT="$?";;
|
||||
esac;;
|
||||
*) case "$2" in
|
||||
*.bz2)
|
||||
bzip2 -cdfq "$2" | $comp $OPTIONS "$1" -
|
||||
STAT="$?";;
|
||||
*) $comp $OPTIONS "$1" "$2"
|
||||
STAT="$?";;
|
||||
esac;;
|
||||
esac
|
||||
exit "$STAT"
|
||||
else
|
||||
echo "Usage: $prog [${comp}_options] file [file]"
|
||||
exit 1
|
||||
fi
|
47
bzdiff.1
Normal file
47
bzdiff.1
Normal file
@ -0,0 +1,47 @@
|
||||
\"Shamelessly copied from zmore.1 by Philippe Troin <phil@fifi.org>
|
||||
\"for Debian GNU/Linux
|
||||
.TH BZDIFF 1
|
||||
.SH NAME
|
||||
bzcmp, bzdiff \- compare bzip2 compressed files
|
||||
.SH SYNOPSIS
|
||||
.B bzcmp
|
||||
[ cmp_options ] file1
|
||||
[ file2 ]
|
||||
.br
|
||||
.B bzdiff
|
||||
[ diff_options ] file1
|
||||
[ file2 ]
|
||||
.SH DESCRIPTION
|
||||
.I Bzcmp
|
||||
and
|
||||
.I bzdiff
|
||||
are used to invoke the
|
||||
.I cmp
|
||||
or the
|
||||
.I diff
|
||||
program on bzip2 compressed files. All options specified are passed
|
||||
directly to
|
||||
.I cmp
|
||||
or
|
||||
.IR diff "."
|
||||
If only 1 file is specified, then the files compared are
|
||||
.I file1
|
||||
and an uncompressed
|
||||
.IR file1 ".bz2."
|
||||
If two files are specified, then they are uncompressed if necessary and fed to
|
||||
.I cmp
|
||||
or
|
||||
.IR diff "."
|
||||
The exit status from
|
||||
.I cmp
|
||||
or
|
||||
.I diff
|
||||
is preserved.
|
||||
.SH "SEE ALSO"
|
||||
cmp(1), diff(1), bzmore(1), bzless(1), bzgrep(1), bzip2(1)
|
||||
.SH BUGS
|
||||
Messages from the
|
||||
.I cmp
|
||||
or
|
||||
.I diff
|
||||
programs refer to temporary filenames instead of those specified.
|
71
bzgrep
Normal file
71
bzgrep
Normal file
@ -0,0 +1,71 @@
|
||||
#!/bin/sh
|
||||
|
||||
# Bzgrep wrapped for bzip2,
|
||||
# adapted from zgrep by Philippe Troin <phil@fifi.org> for Debian GNU/Linux.
|
||||
## zgrep notice:
|
||||
## zgrep -- a wrapper around a grep program that decompresses files as needed
|
||||
## Adapted from a version sent by Charles Levert <charles@comm.polymtl.ca>
|
||||
|
||||
PATH="/usr/bin:$PATH"; export PATH
|
||||
|
||||
prog=`echo $0 | sed 's|.*/||'`
|
||||
case "$prog" in
|
||||
*egrep) grep=${EGREP-egrep} ;;
|
||||
*fgrep) grep=${FGREP-fgrep} ;;
|
||||
*) grep=${GREP-grep} ;;
|
||||
esac
|
||||
pat=""
|
||||
while test $# -ne 0; do
|
||||
case "$1" in
|
||||
-e | -f) opt="$opt $1"; shift; pat="$1"
|
||||
if test "$grep" = grep; then # grep is buggy with -e on SVR4
|
||||
grep=egrep
|
||||
fi;;
|
||||
-A | -B) opt="$opt $1 $2"; shift;;
|
||||
-*) opt="$opt $1";;
|
||||
*) if test -z "$pat"; then
|
||||
pat="$1"
|
||||
else
|
||||
break;
|
||||
fi;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
if test -z "$pat"; then
|
||||
echo "grep through bzip2 files"
|
||||
echo "usage: $prog [grep_options] pattern [files]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
list=0
|
||||
silent=0
|
||||
op=`echo "$opt" | sed -e 's/ //g' -e 's/-//g'`
|
||||
case "$op" in
|
||||
*l*) list=1
|
||||
esac
|
||||
case "$op" in
|
||||
*h*) silent=1
|
||||
esac
|
||||
|
||||
if test $# -eq 0; then
|
||||
bzip2 -cdfq | $grep $opt "$pat"
|
||||
exit $?
|
||||
fi
|
||||
|
||||
res=0
|
||||
for i do
|
||||
if test -f "$i"; then :; else if test -f "$i.bz2"; then i="$i.bz2"; fi; fi
|
||||
if test $list -eq 1; then
|
||||
bzip2 -cdfq "$i" | $grep $opt "$pat" 2>&1 > /dev/null && echo $i
|
||||
r=$?
|
||||
elif test $# -eq 1 -o $silent -eq 1; then
|
||||
bzip2 -cdfq "$i" | $grep $opt "$pat"
|
||||
r=$?
|
||||
else
|
||||
bzip2 -cdfq "$i" | $grep $opt "$pat" | sed "s|^|${i}:|"
|
||||
r=$?
|
||||
fi
|
||||
test "$r" -ne 0 && res="$r"
|
||||
done
|
||||
exit $res
|
56
bzgrep.1
Normal file
56
bzgrep.1
Normal file
@ -0,0 +1,56 @@
|
||||
\"Shamelessly copied from zmore.1 by Philippe Troin <phil@fifi.org>
|
||||
\"for Debian GNU/Linux
|
||||
.TH BZGREP 1
|
||||
.SH NAME
|
||||
bzgrep, bzfgrep, bzegrep \- search possibly bzip2 compressed files for a regular expression
|
||||
.SH SYNOPSIS
|
||||
.B bzgrep
|
||||
[ grep_options ]
|
||||
.BI [\ -e\ ] " pattern"
|
||||
.IR filename ".\|.\|."
|
||||
.br
|
||||
.B bzegrep
|
||||
[ egrep_options ]
|
||||
.BI [\ -e\ ] " pattern"
|
||||
.IR filename ".\|.\|."
|
||||
.br
|
||||
.B bzfgrep
|
||||
[ fgrep_options ]
|
||||
.BI [\ -e\ ] " pattern"
|
||||
.IR filename ".\|.\|."
|
||||
.SH DESCRIPTION
|
||||
.IR Bzgrep
|
||||
is used to invoke the
|
||||
.I grep
|
||||
on bzip2-compressed files. All options specified are passed directly to
|
||||
.I grep.
|
||||
If no file is specified, then the standard input is decompressed
|
||||
if necessary and fed to grep.
|
||||
Otherwise the given files are uncompressed if necessary and fed to
|
||||
.I grep.
|
||||
.PP
|
||||
If
|
||||
.I bzgrep
|
||||
is invoked as
|
||||
.I bzegrep
|
||||
or
|
||||
.I bzfgrep
|
||||
then
|
||||
.I egrep
|
||||
or
|
||||
.I fgrep
|
||||
is used instead of
|
||||
.I grep.
|
||||
If the GREP environment variable is set,
|
||||
.I bzgrep
|
||||
uses it as the
|
||||
.I grep
|
||||
program to be invoked. For example:
|
||||
|
||||
for sh: GREP=fgrep bzgrep string files
|
||||
for csh: (setenv GREP fgrep; bzgrep string files)
|
||||
.SH AUTHOR
|
||||
Charles Levert (charles@comm.polymtl.ca). Adapted to bzip2 by Philippe
|
||||
Troin <phil@fifi.org> for Debian GNU/Linux.
|
||||
.SH "SEE ALSO"
|
||||
grep(1), egrep(1), fgrep(1), bzdiff(1), bzmore(1), bzless(1), bzip2(1)
|
56
bzip2.1
56
bzip2.1
@ -1,7 +1,7 @@
|
||||
.PU
|
||||
.TH bzip2 1
|
||||
.SH NAME
|
||||
bzip2, bunzip2 \- a block-sorting file compressor, v1.0
|
||||
bzip2, bunzip2 \- a block-sorting file compressor, v1.0.2
|
||||
.br
|
||||
bzcat \- decompresses files to stdout
|
||||
.br
|
||||
@ -197,7 +197,7 @@ to decompress.
|
||||
.TP
|
||||
.B \-z --compress
|
||||
The complement to \-d: forces compression, regardless of the
|
||||
invokation name.
|
||||
invocation name.
|
||||
.TP
|
||||
.B \-t --test
|
||||
Check integrity of the specified file(s), but don't decompress them.
|
||||
@ -211,6 +211,10 @@ existing output files. Also forces
|
||||
.I bzip2
|
||||
to break hard links
|
||||
to files, which it otherwise wouldn't do.
|
||||
|
||||
bzip2 normally declines to decompress files which don't have the
|
||||
correct magic header bytes. If forced (-f), however, it will pass
|
||||
such files through unmodified. This is how GNU gzip behaves.
|
||||
.TP
|
||||
.B \-k --keep
|
||||
Keep (don't delete) input files during compression
|
||||
@ -239,9 +243,13 @@ information which is primarily of interest for diagnostic purposes.
|
||||
.B \-L --license -V --version
|
||||
Display the software version, license terms and conditions.
|
||||
.TP
|
||||
.B \-1 to \-9
|
||||
.B \-1 (or \-\-fast) to \-9 (or \-\-best)
|
||||
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
|
||||
effect when decompressing. See MEMORY MANAGEMENT below.
|
||||
The \-\-fast and \-\-best aliases are primarily for GNU gzip
|
||||
compatibility. In particular, \-\-fast doesn't make things
|
||||
significantly faster.
|
||||
And \-\-best merely selects the default behaviour.
|
||||
.TP
|
||||
.B \--
|
||||
Treats all subsequent arguments as file names, even if they start
|
||||
@ -352,11 +360,11 @@ undamaged.
|
||||
|
||||
.I bzip2recover
|
||||
takes a single argument, the name of the damaged file,
|
||||
and writes a number of files "rec0001file.bz2",
|
||||
"rec0002file.bz2", etc, containing the extracted blocks.
|
||||
and writes a number of files "rec00001file.bz2",
|
||||
"rec00002file.bz2", etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example,
|
||||
"bzip2 -dc rec*file.bz2 > recovered_data" -- lists the files in
|
||||
"bzip2 -dc rec*file.bz2 > recovered_data" -- processes the files in
|
||||
the correct order.
|
||||
|
||||
.I bzip2recover
|
||||
@ -397,27 +405,31 @@ I/O error messages are not as helpful as they could be.
|
||||
tries hard to detect I/O errors and exit cleanly, but the details of
|
||||
what the problem is sometimes seem rather misleading.
|
||||
|
||||
This manual page pertains to version 1.0 of
|
||||
This manual page pertains to version 1.0.2 of
|
||||
.I bzip2.
|
||||
Compressed
|
||||
data created by this version is entirely forwards and backwards
|
||||
compatible with the previous public releases, versions 0.1pl2, 0.9.0
|
||||
and 0.9.5,
|
||||
but with the following exception: 0.9.0 and above can correctly
|
||||
decompress multiple concatenated compressed files. 0.1pl2 cannot do
|
||||
this; it will stop after decompressing just the first file in the
|
||||
stream.
|
||||
Compressed data created by this version is entirely forwards and
|
||||
backwards compatible with the previous public releases, versions
|
||||
0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, but with the following
|
||||
exception: 0.9.0 and above can correctly decompress multiple
|
||||
concatenated compressed files. 0.1pl2 cannot do this; it will stop
|
||||
after decompressing just the first file in the stream.
|
||||
|
||||
.I bzip2recover
|
||||
uses 32-bit integers to represent bit positions in
|
||||
compressed files, so it cannot handle compressed files more than 512
|
||||
megabytes long. This could easily be fixed.
|
||||
versions prior to this one, 1.0.2, used 32-bit integers to represent
|
||||
bit positions in compressed files, so it could not handle compressed
|
||||
files more than 512 megabytes long. Version 1.0.2 and above uses
|
||||
64-bit ints on some platforms which support them (GNU supported
|
||||
targets, and Windows). To establish whether or not bzip2recover was
|
||||
built with such a limitation, run it without arguments. In any event
|
||||
you can build yourself an unlimited version if you can recompile it
|
||||
with MaybeUInt64 set to be an unsigned 64-bit integer.
|
||||
|
||||
|
||||
|
||||
.SH AUTHOR
|
||||
Julian Seward, jseward@acm.org.
|
||||
|
||||
http://sourceware.cygnus.com/bzip2
|
||||
http://www.muraroa.demon.co.uk
|
||||
http://sources.redhat.com/bzip2
|
||||
|
||||
The ideas embodied in
|
||||
.I bzip2
|
||||
@ -434,6 +446,8 @@ indebted for their help, support and advice. See the manual in the
|
||||
source distribution for pointers to sources of documentation. Christian
|
||||
von Roques encouraged me to look for faster sorting algorithms, so as to
|
||||
speed up compression. Bela Lubkin encouraged me to improve the
|
||||
worst-case compression performance. Many people sent patches, helped
|
||||
worst-case compression performance.
|
||||
The bz* scripts are derived from those of GNU gzip.
|
||||
Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and were generally
|
||||
helpful.
|
||||
|
@ -1,11 +1,9 @@
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
|
||||
NNAAMMEE
|
||||
bzip2, bunzip2 - a block-sorting file compressor, v1.0
|
||||
bzip2, bunzip2 - a block-sorting file compressor, v1.0.2
|
||||
bzcat - decompresses files to stdout
|
||||
bzip2recover - recovers data from damaged bzip2 files
|
||||
|
||||
@ -22,20 +20,20 @@ DDEESSCCRRIIPPTTIIOONN
|
||||
sorting text compression algorithm, and Huffman coding.
|
||||
Compression is generally considerably better than that
|
||||
achieved by more conventional LZ77/LZ78-based compressors,
|
||||
and approaches the performance of the PPM family of sta-
|
||||
and approaches the performance of the PPM family of sta
|
||||
tistical compressors.
|
||||
|
||||
The command-line options are deliberately very similar to
|
||||
those of _G_N_U _g_z_i_p_, but they are not identical.
|
||||
|
||||
_b_z_i_p_2 expects a list of file names to accompany the com-
|
||||
_b_z_i_p_2 expects a list of file names to accompany the com
|
||||
mand-line flags. Each file is replaced by a compressed
|
||||
version of itself, with the name "original_name.bz2".
|
||||
Each compressed file has the same modification date, per-
|
||||
missions, and, when possible, ownership as the correspond-
|
||||
Each compressed file has the same modification date, per
|
||||
missions, and, when possible, ownership as the correspond
|
||||
ing original, so that these properties can be correctly
|
||||
restored at decompression time. File name handling is
|
||||
naive in the sense that there is no mechanism for preserv-
|
||||
naive in the sense that there is no mechanism for preserv
|
||||
ing original file names, permissions, ownerships or dates
|
||||
in filesystems which lack these concepts, or have serious
|
||||
file name length restrictions, such as MS-DOS.
|
||||
@ -58,18 +56,6 @@ DDEESSCCRRIIPPTTIIOONN
|
||||
filename.bz2 becomes filename
|
||||
filename.bz becomes filename
|
||||
filename.tbz2 becomes filename.tar
|
||||
|
||||
|
||||
|
||||
1
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
filename.tbz becomes filename.tar
|
||||
anyothername becomes anyothername.out
|
||||
|
||||
@ -78,23 +64,23 @@ bzip2(1) bzip2(1)
|
||||
guess the name of the original file, and uses the original
|
||||
name with _._o_u_t appended.
|
||||
|
||||
As with compression, supplying no filenames causes decom-
|
||||
As with compression, supplying no filenames causes decom
|
||||
pression from standard input to standard output.
|
||||
|
||||
_b_u_n_z_i_p_2 will correctly decompress a file which is the con-
|
||||
_b_u_n_z_i_p_2 will correctly decompress a file which is the con
|
||||
catenation of two or more compressed files. The result is
|
||||
the concatenation of the corresponding uncompressed files.
|
||||
Integrity testing (-t) of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
You can also compress or decompress files to the standard
|
||||
output by giving the -c flag. Multiple files may be com-
|
||||
output by giving the -c flag. Multiple files may be com
|
||||
pressed and decompressed like this. The resulting outputs
|
||||
are fed sequentially to stdout. Compression of multiple
|
||||
files in this manner generates a stream containing multi-
|
||||
files in this manner generates a stream containing multi
|
||||
ple compressed file representations. Such a stream can be
|
||||
decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
|
||||
later. Earlier versions of _b_z_i_p_2 will stop after decom-
|
||||
later. Earlier versions of _b_z_i_p_2 will stop after decom
|
||||
pressing the first file in the stream.
|
||||
|
||||
_b_z_c_a_t (or _b_z_i_p_2 _-_d_c_) decompresses all specified files to
|
||||
@ -115,7 +101,7 @@ bzip2(1) bzip2(1)
|
||||
|
||||
As a self-check for your protection, _b_z_i_p_2 uses 32-bit
|
||||
CRCs to make sure that the decompressed version of a file
|
||||
is identical to the original. This guards against corrup-
|
||||
is identical to the original. This guards against corrup
|
||||
tion of the compressed data, and against undetected bugs
|
||||
in _b_z_i_p_2 (hopefully very unlikely). The chances of data
|
||||
corruption going undetected is microscopic, about one
|
||||
@ -125,17 +111,6 @@ bzip2(1) bzip2(1)
|
||||
you recover the original uncompressed data. You can use
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
|
||||
|
||||
|
||||
|
||||
2
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
Return values: 0 for a normal exit, 1 for environmental
|
||||
problems (file not found, invalid flags, I/O errors, &c),
|
||||
2 to indicate a corrupt compressed file, 3 for an internal
|
||||
@ -154,8 +129,8 @@ OOPPTTIIOONNSS
|
||||
and forces _b_z_i_p_2 to decompress.
|
||||
|
||||
--zz ----ccoommpprreessss
|
||||
The complement to -d: forces compression, regard-
|
||||
less of the invokation name.
|
||||
The complement to -d: forces compression,
|
||||
regardless of the invocation name.
|
||||
|
||||
--tt ----tteesstt
|
||||
Check integrity of the specified file(s), but don't
|
||||
@ -168,6 +143,11 @@ OOPPTTIIOONNSS
|
||||
forces _b_z_i_p_2 to break hard links to files, which it
|
||||
otherwise wouldn't do.
|
||||
|
||||
bzip2 normally declines to decompress files which
|
||||
don't have the correct magic header bytes. If
|
||||
forced (-f), however, it will pass such files
|
||||
through unmodified. This is how GNU gzip behaves.
|
||||
|
||||
--kk ----kkeeeepp
|
||||
Keep (don't delete) input files during compression
|
||||
or decompression.
|
||||
@ -190,23 +170,11 @@ OOPPTTIIOONNSS
|
||||
--qq ----qquuiieett
|
||||
Suppress non-essential warning messages. Messages
|
||||
pertaining to I/O errors and other critical events
|
||||
|
||||
|
||||
|
||||
3
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
will not be suppressed.
|
||||
|
||||
--vv ----vveerrbboossee
|
||||
Verbose mode -- show the compression ratio for each
|
||||
file processed. Further -v's increase the ver-
|
||||
file processed. Further -v's increase the ver
|
||||
bosity level, spewing out lots of information which
|
||||
is primarily of interest for diagnostic purposes.
|
||||
|
||||
@ -214,20 +182,24 @@ bzip2(1) bzip2(1)
|
||||
Display the software version, license terms and
|
||||
conditions.
|
||||
|
||||
--11 ttoo --99
|
||||
--11 ((oorr ----ffaasstt)) ttoo --99 ((oorr ----bbeesstt))
|
||||
Set the block size to 100 k, 200 k .. 900 k when
|
||||
compressing. Has no effect when decompressing.
|
||||
See MEMORY MANAGEMENT below.
|
||||
See MEMORY MANAGEMENT below. The --fast and --best
|
||||
aliases are primarily for GNU gzip compatibility.
|
||||
In particular, --fast doesn't make things signifi
|
||||
cantly faster. And --best merely selects the
|
||||
default behaviour.
|
||||
|
||||
---- Treats all subsequent arguments as file names, even
|
||||
if they start with a dash. This is so you can han-
|
||||
if they start with a dash. This is so you can han
|
||||
dle files with names beginning with a dash, for
|
||||
example: bzip2 -- -myfilename.
|
||||
|
||||
----rreeppeettiittiivvee--ffaasstt ----rreeppeettiittiivvee--bbeesstt
|
||||
These flags are redundant in versions 0.9.5 and
|
||||
above. They provided some coarse control over the
|
||||
behaviour of the sorting algorithm in earlier ver-
|
||||
behaviour of the sorting algorithm in earlier ver
|
||||
sions, which was sometimes useful. 0.9.5 and above
|
||||
have an improved algorithm which renders these
|
||||
flags irrelevant.
|
||||
@ -238,7 +210,7 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
|
||||
affects both the compression ratio achieved, and the
|
||||
amount of memory needed for compression and decompression.
|
||||
The flags -1 through -9 specify the block size to be
|
||||
100,000 bytes through 900,000 bytes (the default) respec-
|
||||
100,000 bytes through 900,000 bytes (the default) respec
|
||||
tively. At decompression time, the block size used for
|
||||
compression is read from the header of the compressed
|
||||
file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
|
||||
@ -256,18 +228,6 @@ MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
|
||||
|
||||
Larger block sizes give rapidly diminishing marginal
|
||||
returns. Most of the compression comes from the first two
|
||||
|
||||
|
||||
|
||||
4
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
or three hundred k of block size, a fact worth bearing in
|
||||
mind when using _b_z_i_p_2 on small machines. It is also
|
||||
important to appreciate that the decompression memory
|
||||
@ -278,13 +238,13 @@ bzip2(1) bzip2(1)
|
||||
_b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
|
||||
support decompression of any file on a 4 megabyte machine,
|
||||
_b_u_n_z_i_p_2 has an option to decompress using approximately
|
||||
half this amount of memory, about 2300 kbytes. Decompres-
|
||||
half this amount of memory, about 2300 kbytes. Decompres
|
||||
sion speed is also halved, so you should use this option
|
||||
only where necessary. The relevant flag is -s.
|
||||
|
||||
In general, try and use the largest block size memory con-
|
||||
In general, try and use the largest block size memory con
|
||||
straints allow, since that maximises the compression
|
||||
achieved. Compression and decompression speed are virtu-
|
||||
achieved. Compression and decompression speed are virtu
|
||||
ally unaffected by block size.
|
||||
|
||||
Another significant point applies to files which fit in a
|
||||
@ -300,11 +260,11 @@ bzip2(1) bzip2(1)
|
||||
|
||||
Here is a table which summarises the maximum memory usage
|
||||
for different block sizes. Also recorded is the total
|
||||
compressed size for 14 files of the Calgary Text Compres-
|
||||
compressed size for 14 files of the Calgary Text Compres
|
||||
sion Corpus totalling 3,141,622 bytes. This column gives
|
||||
some feel for how compression varies with block size.
|
||||
These figures tend to understate the advantage of larger
|
||||
block sizes for larger files, since the Corpus is domi-
|
||||
block sizes for larger files, since the Corpus is domi
|
||||
nated by smaller files.
|
||||
|
||||
Compress Decompress Decompress Corpus
|
||||
@ -321,22 +281,9 @@ bzip2(1) bzip2(1)
|
||||
-9 7600k 3700k 2350k 828642
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
5
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
|
||||
_b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
|
||||
Each block is handled independently. If a media or trans-
|
||||
Each block is handled independently. If a media or trans
|
||||
mission error causes a multi-block .bz2 file to become
|
||||
damaged, it may be possible to recover data from the
|
||||
undamaged blocks in the file.
|
||||
@ -353,19 +300,19 @@ RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD F
|
||||
the integrity of the resulting files, and decompress those
|
||||
which are undamaged.
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam-
|
||||
aged file, and writes a number of files "rec0001file.bz2",
|
||||
"rec0002file.bz2", etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example, "bzip2
|
||||
-dc rec*file.bz2 > recovered_data" -- lists the files in
|
||||
the correct order.
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the dam
|
||||
aged file, and writes a number of files
|
||||
"rec00001file.bz2", "rec00002file.bz2", etc, containing
|
||||
the extracted blocks. The output filenames are
|
||||
designed so that the use of wildcards in subsequent pro
|
||||
cessing -- for example, "bzip2 -dc rec*file.bz2 > recov
|
||||
ered_data" -- processes the files in the correct order.
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
|
||||
files, as these will contain many blocks. It is clearly
|
||||
futile to use it on damaged single-block files, since a
|
||||
damaged block cannot be recovered. If you wish to min-
|
||||
imise any potential data loss through media or transmis-
|
||||
damaged block cannot be recovered. If you wish to min
|
||||
imise any potential data loss through media or transmis
|
||||
sion errors, you might consider compressing with a smaller
|
||||
block size.
|
||||
|
||||
@ -379,31 +326,19 @@ PPEERRFFOORRMMAANNCCEE NNOOTTEESS
|
||||
better than previous versions in this respect. The ratio
|
||||
between worst-case and average-case compression time is in
|
||||
the region of 10:1. For previous versions, this figure
|
||||
was more like 100:1. You can use the -vvvv option to mon-
|
||||
was more like 100:1. You can use the -vvvv option to mon
|
||||
itor progress in great detail, if you want.
|
||||
|
||||
Decompression speed is unaffected by these phenomena.
|
||||
|
||||
_b_z_i_p_2 usually allocates several megabytes of memory to
|
||||
operate in, and then charges all over it in a fairly ran-
|
||||
dom fashion. This means that performance, both for com-
|
||||
operate in, and then charges all over it in a fairly ran
|
||||
dom fashion. This means that performance, both for com
|
||||
pressing and decompressing, is largely determined by the
|
||||
|
||||
|
||||
|
||||
6
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
bzip2(1) bzip2(1)
|
||||
|
||||
|
||||
speed at which your machine can service cache misses.
|
||||
Because of this, small changes to the code to reduce the
|
||||
miss rate have been observed to give disproportionately
|
||||
large performance improvements. I imagine _b_z_i_p_2 will per-
|
||||
large performance improvements. I imagine _b_z_i_p_2 will per
|
||||
form best on machines with very large caches.
|
||||
|
||||
|
||||
@ -413,50 +348,51 @@ CCAAVVEEAATTSS
|
||||
but the details of what the problem is sometimes seem
|
||||
rather misleading.
|
||||
|
||||
This manual page pertains to version 1.0 of _b_z_i_p_2_. Com-
|
||||
This manual page pertains to version 1.0.2 of _b_z_i_p_2_. Com
|
||||
pressed data created by this version is entirely forwards
|
||||
and backwards compatible with the previous public
|
||||
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
|
||||
following exception: 0.9.0 and above can correctly decom-
|
||||
press multiple concatenated compressed files. 0.1pl2 can-
|
||||
not do this; it will stop after decompressing just the
|
||||
first file in the stream.
|
||||
releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
|
||||
but with the following exception: 0.9.0 and above can cor
|
||||
rectly decompress multiple concatenated compressed files.
|
||||
0.1pl2 cannot do this; it will stop after decompressing
|
||||
just the first file in the stream.
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r versions prior to this one, 1.0.2, used
|
||||
32-bit integers to represent bit positions in compressed
|
||||
files, so it could not handle compressed files more than
|
||||
512 megabytes long. Version 1.0.2 and above uses 64-bit
|
||||
ints on some platforms which support them (GNU supported
|
||||
targets, and Windows). To establish whether or not
|
||||
bzip2recover was built with such a limitation, run it
|
||||
without arguments. In any event you can build yourself an
|
||||
unlimited version if you can recompile it with MaybeUInt64
|
||||
set to be an unsigned 64-bit integer.
|
||||
|
||||
|
||||
_b_z_i_p_2_r_e_c_o_v_e_r uses 32-bit integers to represent bit posi-
|
||||
tions in compressed files, so it cannot handle compressed
|
||||
files more than 512 megabytes long. This could easily be
|
||||
fixed.
|
||||
|
||||
|
||||
AAUUTTHHOORR
|
||||
Julian Seward, jseward@acm.org.
|
||||
|
||||
http://sourceware.cygnus.com/bzip2
|
||||
http://www.muraroa.demon.co.uk
|
||||
http://sources.redhat.com/bzip2
|
||||
|
||||
The ideas embodied in _b_z_i_p_2 are due to (at least) the fol-
|
||||
The ideas embodied in _b_z_i_p_2 are due to (at least) the fol
|
||||
lowing people: Michael Burrows and David Wheeler (for the
|
||||
block sorting transformation), David Wheeler (again, for
|
||||
the Huffman coder), Peter Fenwick (for the structured cod-
|
||||
the Huffman coder), Peter Fenwick (for the structured cod
|
||||
ing model in the original _b_z_i_p_, and many refinements), and
|
||||
Alistair Moffat, Radford Neal and Ian Witten (for the
|
||||
arithmetic coder in the original _b_z_i_p_)_. I am much
|
||||
indebted for their help, support and advice. See the man-
|
||||
indebted for their help, support and advice. See the man
|
||||
ual in the source distribution for pointers to sources of
|
||||
documentation. Christian von Roques encouraged me to look
|
||||
for faster sorting algorithms, so as to speed up compres-
|
||||
for faster sorting algorithms, so as to speed up compres
|
||||
sion. Bela Lubkin encouraged me to improve the worst-case
|
||||
compression performance. Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and
|
||||
were generally helpful.
|
||||
compression performance. The bz* scripts are derived from
|
||||
those of GNU gzip. Many people sent patches, helped with
|
||||
portability problems, lent machines, gave advice and were
|
||||
generally helpful.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
7
|
||||
|
||||
|
||||
bzip2(1)
|
||||
|
425
bzip2.c
425
bzip2.c
@ -7,7 +7,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -113,13 +113,16 @@
|
||||
/*--
|
||||
Generic 32-bit Unix.
|
||||
Also works on 64-bit Unix boxes.
|
||||
This is the default.
|
||||
--*/
|
||||
#define BZ_UNIX 1
|
||||
|
||||
/*--
|
||||
Win32, as seen by Jacob Navia's excellent
|
||||
port of (Chris Fraser & David Hanson)'s excellent
|
||||
lcc compiler.
|
||||
lcc compiler. Or with MS Visual C.
|
||||
This is selected automatically if compiled by a compiler which
|
||||
defines _WIN32, not including the Cygwin GCC.
|
||||
--*/
|
||||
#define BZ_LCCWIN32 0
|
||||
|
||||
@ -156,6 +159,7 @@
|
||||
--*/
|
||||
|
||||
#if BZ_UNIX
|
||||
# include <fcntl.h>
|
||||
# include <sys/types.h>
|
||||
# include <utime.h>
|
||||
# include <unistd.h>
|
||||
@ -164,8 +168,9 @@
|
||||
|
||||
# define PATH_SEP '/'
|
||||
# define MY_LSTAT lstat
|
||||
# define MY_S_IFREG S_ISREG
|
||||
# define MY_STAT stat
|
||||
# define MY_S_ISREG S_ISREG
|
||||
# define MY_S_ISDIR S_ISDIR
|
||||
|
||||
# define APPEND_FILESPEC(root, name) \
|
||||
root=snocString((root), (name))
|
||||
@ -180,11 +185,14 @@
|
||||
# else
|
||||
# define NORETURN /**/
|
||||
# endif
|
||||
|
||||
# ifdef __DJGPP__
|
||||
# include <io.h>
|
||||
# include <fcntl.h>
|
||||
# undef MY_LSTAT
|
||||
# undef MY_STAT
|
||||
# define MY_LSTAT stat
|
||||
# define MY_STAT stat
|
||||
# undef SET_BINARY_MODE
|
||||
# define SET_BINARY_MODE(fd) \
|
||||
do { \
|
||||
@ -193,6 +201,7 @@
|
||||
ERROR_IF_MINUS_ONE ( retVal ); \
|
||||
} while ( 0 )
|
||||
# endif
|
||||
|
||||
# ifdef __CYGWIN__
|
||||
# include <io.h>
|
||||
# include <fcntl.h>
|
||||
@ -204,7 +213,7 @@
|
||||
ERROR_IF_MINUS_ONE ( retVal ); \
|
||||
} while ( 0 )
|
||||
# endif
|
||||
#endif
|
||||
#endif /* BZ_UNIX */
|
||||
|
||||
|
||||
|
||||
@ -217,37 +226,14 @@
|
||||
# define PATH_SEP '\\'
|
||||
# define MY_LSTAT _stat
|
||||
# define MY_STAT _stat
|
||||
# define MY_S_IFREG(x) ((x) & _S_IFREG)
|
||||
# define MY_S_ISREG(x) ((x) & _S_IFREG)
|
||||
# define MY_S_ISDIR(x) ((x) & _S_IFDIR)
|
||||
|
||||
# define APPEND_FLAG(root, name) \
|
||||
root=snocString((root), (name))
|
||||
|
||||
# if 0
|
||||
/*-- lcc-win32 seems to expand wildcards itself --*/
|
||||
# define APPEND_FILESPEC(root, spec) \
|
||||
do { \
|
||||
if ((spec)[0] == '-') { \
|
||||
root = snocString((root), (spec)); \
|
||||
} else { \
|
||||
struct _finddata_t c_file; \
|
||||
long hFile; \
|
||||
hFile = _findfirst((spec), &c_file); \
|
||||
if ( hFile == -1L ) { \
|
||||
root = snocString ((root), (spec)); \
|
||||
} else { \
|
||||
int anInt = 0; \
|
||||
while ( anInt == 0 ) { \
|
||||
root = snocString((root), \
|
||||
&c_file.name[0]); \
|
||||
anInt = _findnext(hFile, &c_file); \
|
||||
} \
|
||||
} \
|
||||
} \
|
||||
} while ( 0 )
|
||||
# else
|
||||
# define APPEND_FILESPEC(root, name) \
|
||||
root = snocString ((root), (name))
|
||||
# endif
|
||||
|
||||
# define SET_BINARY_MODE(fd) \
|
||||
do { \
|
||||
@ -256,7 +242,7 @@
|
||||
ERROR_IF_MINUS_ONE ( retVal ); \
|
||||
} while ( 0 )
|
||||
|
||||
#endif
|
||||
#endif /* BZ_LCCWIN32 */
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
@ -338,6 +324,7 @@ typedef
|
||||
struct { UChar b[8]; }
|
||||
UInt64;
|
||||
|
||||
|
||||
static
|
||||
void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )
|
||||
{
|
||||
@ -351,6 +338,7 @@ void uInt64_from_UInt32s ( UInt64* n, UInt32 lo32, UInt32 hi32 )
|
||||
n->b[0] = (UChar) (lo32 & 0xFF);
|
||||
}
|
||||
|
||||
|
||||
static
|
||||
double uInt64_to_double ( UInt64* n )
|
||||
{
|
||||
@ -364,77 +352,6 @@ double uInt64_to_double ( UInt64* n )
|
||||
return sum;
|
||||
}
|
||||
|
||||
static
|
||||
void uInt64_add ( UInt64* src, UInt64* dst )
|
||||
{
|
||||
Int32 i;
|
||||
Int32 carry = 0;
|
||||
for (i = 0; i < 8; i++) {
|
||||
carry += ( ((Int32)src->b[i]) + ((Int32)dst->b[i]) );
|
||||
dst->b[i] = (UChar)(carry & 0xFF);
|
||||
carry >>= 8;
|
||||
}
|
||||
}
|
||||
|
||||
static
|
||||
void uInt64_sub ( UInt64* src, UInt64* dst )
|
||||
{
|
||||
Int32 t, i;
|
||||
Int32 borrow = 0;
|
||||
for (i = 0; i < 8; i++) {
|
||||
t = ((Int32)dst->b[i]) - ((Int32)src->b[i]) - borrow;
|
||||
if (t < 0) {
|
||||
dst->b[i] = (UChar)(t + 256);
|
||||
borrow = 1;
|
||||
} else {
|
||||
dst->b[i] = (UChar)t;
|
||||
borrow = 0;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static
|
||||
void uInt64_mul ( UInt64* a, UInt64* b, UInt64* r_hi, UInt64* r_lo )
|
||||
{
|
||||
UChar sum[16];
|
||||
Int32 ia, ib, carry;
|
||||
for (ia = 0; ia < 16; ia++) sum[ia] = 0;
|
||||
for (ia = 0; ia < 8; ia++) {
|
||||
carry = 0;
|
||||
for (ib = 0; ib < 8; ib++) {
|
||||
carry += ( ((Int32)sum[ia+ib])
|
||||
+ ((Int32)a->b[ia]) * ((Int32)b->b[ib]) );
|
||||
sum[ia+ib] = (UChar)(carry & 0xFF);
|
||||
carry >>= 8;
|
||||
}
|
||||
sum[ia+8] = (UChar)(carry & 0xFF);
|
||||
if ((carry >>= 8) != 0) panic ( "uInt64_mul" );
|
||||
}
|
||||
|
||||
for (ia = 0; ia < 8; ia++) r_hi->b[ia] = sum[ia+8];
|
||||
for (ia = 0; ia < 8; ia++) r_lo->b[ia] = sum[ia];
|
||||
}
|
||||
|
||||
|
||||
static
|
||||
void uInt64_shr1 ( UInt64* n )
|
||||
{
|
||||
Int32 i;
|
||||
for (i = 0; i < 8; i++) {
|
||||
n->b[i] >>= 1;
|
||||
if (i < 7 && (n->b[i+1] & 1)) n->b[i] |= 0x80;
|
||||
}
|
||||
}
|
||||
|
||||
static
|
||||
void uInt64_shl1 ( UInt64* n )
|
||||
{
|
||||
Int32 i;
|
||||
for (i = 7; i >= 0; i--) {
|
||||
n->b[i] <<= 1;
|
||||
if (i > 0 && (n->b[i-1] & 0x80)) n->b[i]++;
|
||||
}
|
||||
}
|
||||
|
||||
static
|
||||
Bool uInt64_isZero ( UInt64* n )
|
||||
@ -445,48 +362,22 @@ Bool uInt64_isZero ( UInt64* n )
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
||||
/* Divide *n by 10, and return the remainder. */
|
||||
static
|
||||
Int32 uInt64_qrm10 ( UInt64* n )
|
||||
{
|
||||
/* Divide *n by 10, and return the remainder. Long division
|
||||
is difficult, so we cheat and instead multiply by
|
||||
0xCCCC CCCC CCCC CCCD, which is 0.8 (viz, 0.1 << 3).
|
||||
*/
|
||||
UInt32 rem, tmp;
|
||||
Int32 i;
|
||||
UInt64 tmp1, tmp2, n_orig, zero_point_eight;
|
||||
|
||||
zero_point_eight.b[1] = zero_point_eight.b[2] =
|
||||
zero_point_eight.b[3] = zero_point_eight.b[4] =
|
||||
zero_point_eight.b[5] = zero_point_eight.b[6] =
|
||||
zero_point_eight.b[7] = 0xCC;
|
||||
zero_point_eight.b[0] = 0xCD;
|
||||
|
||||
n_orig = *n;
|
||||
|
||||
/* divide n by 10,
|
||||
by multiplying by 0.8 and then shifting right 3 times */
|
||||
uInt64_mul ( n, &zero_point_eight, &tmp1, &tmp2 );
|
||||
uInt64_shr1(&tmp1); uInt64_shr1(&tmp1); uInt64_shr1(&tmp1);
|
||||
*n = tmp1;
|
||||
|
||||
/* tmp1 = 8*n, tmp2 = 2*n */
|
||||
uInt64_shl1(&tmp1); uInt64_shl1(&tmp1); uInt64_shl1(&tmp1);
|
||||
tmp2 = *n; uInt64_shl1(&tmp2);
|
||||
|
||||
/* tmp1 = 10*n */
|
||||
uInt64_add ( &tmp2, &tmp1 );
|
||||
|
||||
/* n_orig = n_orig - 10*n */
|
||||
uInt64_sub ( &tmp1, &n_orig );
|
||||
|
||||
/* n_orig should now hold quotient, in range 0 .. 9 */
|
||||
for (i = 7; i >= 1; i--)
|
||||
if (n_orig.b[i] != 0) panic ( "uInt64_qrm10(1)" );
|
||||
if (n_orig.b[0] > 9)
|
||||
panic ( "uInt64_qrm10(2)" );
|
||||
|
||||
return (int)n_orig.b[0];
|
||||
rem = 0;
|
||||
for (i = 7; i >= 0; i--) {
|
||||
tmp = rem * 256 + n->b[i];
|
||||
n->b[i] = tmp / 10;
|
||||
rem = tmp % 10;
|
||||
}
|
||||
return rem;
|
||||
}
|
||||
|
||||
|
||||
/* ... and the Whole Entire Point of all this UInt64 stuff is
|
||||
so that we can supply the following function.
|
||||
@ -504,7 +395,8 @@ void uInt64_toAscii ( char* outbuf, UInt64* n )
|
||||
nBuf++;
|
||||
} while (!uInt64_isZero(&n_copy));
|
||||
outbuf[nBuf] = 0;
|
||||
for (i = 0; i < nBuf; i++) outbuf[i] = buf[nBuf-i-1];
|
||||
for (i = 0; i < nBuf; i++)
|
||||
outbuf[i] = buf[nBuf-i-1];
|
||||
}
|
||||
|
||||
|
||||
@ -566,16 +458,18 @@ void compressStream ( FILE *stream, FILE *zStream )
|
||||
if (ret == EOF) goto errhandler_io;
|
||||
if (zStream != stdout) {
|
||||
ret = fclose ( zStream );
|
||||
outputHandleJustInCase = NULL;
|
||||
if (ret == EOF) goto errhandler_io;
|
||||
}
|
||||
outputHandleJustInCase = NULL;
|
||||
if (ferror(stream)) goto errhandler_io;
|
||||
ret = fclose ( stream );
|
||||
if (ret == EOF) goto errhandler_io;
|
||||
|
||||
if (nbytes_in_lo32 == 0 && nbytes_in_hi32 == 0)
|
||||
nbytes_in_lo32 = 1;
|
||||
|
||||
if (verbosity >= 1) {
|
||||
if (nbytes_in_lo32 == 0 && nbytes_in_hi32 == 0) {
|
||||
fprintf ( stderr, " no data compressed.\n");
|
||||
} else {
|
||||
Char buf_nin[32], buf_nout[32];
|
||||
UInt64 nbytes_in, nbytes_out;
|
||||
double nbytes_in_d, nbytes_out_d;
|
||||
@ -596,6 +490,7 @@ void compressStream ( FILE *stream, FILE *zStream )
|
||||
buf_nout
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
return;
|
||||
|
||||
@ -652,7 +547,7 @@ Bool uncompressStream ( FILE *zStream, FILE *stream )
|
||||
|
||||
while (bzerr == BZ_OK) {
|
||||
nread = BZ2_bzRead ( &bzerr, bzf, obuf, 5000 );
|
||||
if (bzerr == BZ_DATA_ERROR_MAGIC) goto errhandler;
|
||||
if (bzerr == BZ_DATA_ERROR_MAGIC) goto trycat;
|
||||
if ((bzerr == BZ_OK || bzerr == BZ_STREAM_END) && nread > 0)
|
||||
fwrite ( obuf, sizeof(UChar), nread, stream );
|
||||
if (ferror(stream)) goto errhandler_io;
|
||||
@ -668,9 +563,9 @@ Bool uncompressStream ( FILE *zStream, FILE *stream )
|
||||
if (bzerr != BZ_OK) panic ( "decompress:bzReadGetUnused" );
|
||||
|
||||
if (nUnused == 0 && myfeof(zStream)) break;
|
||||
|
||||
}
|
||||
|
||||
closeok:
|
||||
if (ferror(zStream)) goto errhandler_io;
|
||||
ret = fclose ( zStream );
|
||||
if (ret == EOF) goto errhandler_io;
|
||||
@ -680,11 +575,26 @@ Bool uncompressStream ( FILE *zStream, FILE *stream )
|
||||
if (ret != 0) goto errhandler_io;
|
||||
if (stream != stdout) {
|
||||
ret = fclose ( stream );
|
||||
outputHandleJustInCase = NULL;
|
||||
if (ret == EOF) goto errhandler_io;
|
||||
}
|
||||
outputHandleJustInCase = NULL;
|
||||
if (verbosity >= 2) fprintf ( stderr, "\n " );
|
||||
return True;
|
||||
|
||||
trycat:
|
||||
if (forceOverwrite) {
|
||||
rewind(zStream);
|
||||
while (True) {
|
||||
if (myfeof(zStream)) break;
|
||||
nread = fread ( obuf, sizeof(UChar), 5000, zStream );
|
||||
if (ferror(zStream)) goto errhandler_io;
|
||||
if (nread > 0) fwrite ( obuf, sizeof(UChar), nread, stream );
|
||||
if (ferror(stream)) goto errhandler_io;
|
||||
}
|
||||
goto closeok;
|
||||
}
|
||||
|
||||
errhandler:
|
||||
BZ2_bzReadClose ( &bzerr_dummy, bzf );
|
||||
switch (bzerr) {
|
||||
@ -832,7 +742,7 @@ void cadvise ( void )
|
||||
stderr,
|
||||
"\nIt is possible that the compressed file(s) have become corrupted.\n"
|
||||
"You can use the -tvv option to test integrity of such files.\n\n"
|
||||
"You can use the `bzip2recover' program to *attempt* to recover\n"
|
||||
"You can use the `bzip2recover' program to attempt to recover\n"
|
||||
"data from undamaged sections of corrupted files.\n\n"
|
||||
);
|
||||
}
|
||||
@ -856,27 +766,54 @@ static
|
||||
void cleanUpAndFail ( Int32 ec )
|
||||
{
|
||||
IntNative retVal;
|
||||
struct MY_STAT statBuf;
|
||||
|
||||
if ( srcMode == SM_F2F
|
||||
&& opMode != OM_TEST
|
||||
&& deleteOutputOnInterrupt ) {
|
||||
|
||||
/* Check whether input file still exists. Delete output file
|
||||
only if input exists to avoid loss of data. Joerg Prante, 5
|
||||
January 2002. (JRS 06-Jan-2002: other changes in 1.0.2 mean
|
||||
this is less likely to happen. But to be ultra-paranoid, we
|
||||
do the check anyway.) */
|
||||
retVal = MY_STAT ( inName, &statBuf );
|
||||
if (retVal == 0) {
|
||||
if (noisy)
|
||||
fprintf ( stderr, "%s: Deleting output file %s, if it exists.\n",
|
||||
fprintf ( stderr,
|
||||
"%s: Deleting output file %s, if it exists.\n",
|
||||
progName, outName );
|
||||
if (outputHandleJustInCase != NULL)
|
||||
fclose ( outputHandleJustInCase );
|
||||
retVal = remove ( outName );
|
||||
if (retVal != 0)
|
||||
fprintf ( stderr,
|
||||
"%s: WARNING: deletion of output file (apparently) failed.\n",
|
||||
"%s: WARNING: deletion of output file "
|
||||
"(apparently) failed.\n",
|
||||
progName );
|
||||
} else {
|
||||
fprintf ( stderr,
|
||||
"%s: WARNING: deletion of output file suppressed\n",
|
||||
progName );
|
||||
fprintf ( stderr,
|
||||
"%s: since input file no longer exists. Output file\n",
|
||||
progName );
|
||||
fprintf ( stderr,
|
||||
"%s: `%s' may be incomplete.\n",
|
||||
progName, outName );
|
||||
fprintf ( stderr,
|
||||
"%s: I suggest doing an integrity test (bzip2 -tv)"
|
||||
" of it.\n",
|
||||
progName );
|
||||
}
|
||||
}
|
||||
|
||||
if (noisy && numFileNames > 0 && numFilesProcessed < numFileNames) {
|
||||
fprintf ( stderr,
|
||||
"%s: WARNING: some files have not been processed:\n"
|
||||
"\t%d specified on command line, %d not processed yet.\n\n",
|
||||
progName, numFileNames,
|
||||
numFileNames - numFilesProcessed );
|
||||
"%s: %d specified on command line, %d not processed yet.\n\n",
|
||||
progName, progName,
|
||||
numFileNames, numFileNames - numFilesProcessed );
|
||||
}
|
||||
setExit(ec);
|
||||
exit(exitValue);
|
||||
@ -915,6 +852,7 @@ void crcError ( void )
|
||||
static
|
||||
void compressedStreamEOF ( void )
|
||||
{
|
||||
if (noisy) {
|
||||
fprintf ( stderr,
|
||||
"\n%s: Compressed file ends unexpectedly;\n\t"
|
||||
"perhaps it is corrupted? *Possible* reason follows.\n",
|
||||
@ -922,6 +860,7 @@ void compressedStreamEOF ( void )
|
||||
perror ( progName );
|
||||
showFileNames();
|
||||
cadvise();
|
||||
}
|
||||
cleanUpAndFail( 2 );
|
||||
}
|
||||
|
||||
@ -1038,6 +977,11 @@ void configError ( void )
|
||||
/*--- The main driver machinery ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
/* All rather crufty. The main problem is that input files
|
||||
are stat()d multiple times before use. This should be
|
||||
cleaned up.
|
||||
*/
|
||||
|
||||
/*---------------------------------------------*/
|
||||
static
|
||||
void pad ( Char *s )
|
||||
@ -1081,6 +1025,32 @@ Bool fileExists ( Char* name )
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
/* Open an output file safely with O_EXCL and good permissions.
|
||||
This avoids a race condition in versions < 1.0.2, in which
|
||||
the file was first opened and then had its interim permissions
|
||||
set safely. We instead use open() to create the file with
|
||||
the interim permissions required. (--- --- rw-).
|
||||
|
||||
For non-Unix platforms, if we are not worrying about
|
||||
security issues, simple this simply behaves like fopen.
|
||||
*/
|
||||
FILE* fopen_output_safely ( Char* name, const char* mode )
|
||||
{
|
||||
# if BZ_UNIX
|
||||
FILE* fp;
|
||||
IntNative fh;
|
||||
fh = open(name, O_WRONLY|O_CREAT|O_EXCL, S_IWUSR|S_IRUSR);
|
||||
if (fh == -1) return NULL;
|
||||
fp = fdopen(fh, mode);
|
||||
if (fp == NULL) close(fh);
|
||||
return fp;
|
||||
# else
|
||||
return fopen(name, mode);
|
||||
# endif
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
/*--
|
||||
if in doubt, return True
|
||||
@ -1093,7 +1063,7 @@ Bool notAStandardFile ( Char* name )
|
||||
|
||||
i = MY_LSTAT ( name, &statBuf );
|
||||
if (i != 0) return True;
|
||||
if (MY_S_IFREG(statBuf.st_mode)) return False;
|
||||
if (MY_S_ISREG(statBuf.st_mode)) return False;
|
||||
return True;
|
||||
}
|
||||
|
||||
@ -1115,26 +1085,62 @@ Int32 countHardLinks ( Char* name )
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
/* Copy modification date, access date, permissions and owner from the
|
||||
source to destination file. We have to copy this meta-info off
|
||||
into fileMetaInfo before starting to compress / decompress it,
|
||||
because doing it afterwards means we get the wrong access time.
|
||||
|
||||
To complicate matters, in compress() and decompress() below, the
|
||||
sequence of tests preceding the call to saveInputFileMetaInfo()
|
||||
involves calling fileExists(), which in turn establishes its result
|
||||
by attempting to fopen() the file, and if successful, immediately
|
||||
fclose()ing it again. So we have to assume that the fopen() call
|
||||
does not cause the access time field to be updated.
|
||||
|
||||
Reading of the man page for stat() (man 2 stat) on RedHat 7.2 seems
|
||||
to imply that merely doing open() will not affect the access time.
|
||||
Therefore we merely need to hope that the C library only does
|
||||
open() as a result of fopen(), and not any kind of read()-ahead
|
||||
cleverness.
|
||||
|
||||
It sounds pretty fragile to me. Whether this carries across
|
||||
robustly to arbitrary Unix-like platforms (or even works robustly
|
||||
on this one, RedHat 7.2) is unknown to me. Nevertheless ...
|
||||
*/
|
||||
#if BZ_UNIX
|
||||
static
|
||||
void copyDatePermissionsAndOwner ( Char *srcName, Char *dstName )
|
||||
struct MY_STAT fileMetaInfo;
|
||||
#endif
|
||||
|
||||
static
|
||||
void saveInputFileMetaInfo ( Char *srcName )
|
||||
{
|
||||
# if BZ_UNIX
|
||||
IntNative retVal;
|
||||
/* Note use of stat here, not lstat. */
|
||||
retVal = MY_STAT( srcName, &fileMetaInfo );
|
||||
ERROR_IF_NOT_ZERO ( retVal );
|
||||
# endif
|
||||
}
|
||||
|
||||
|
||||
static
|
||||
void applySavedMetaInfoToOutputFile ( Char *dstName )
|
||||
{
|
||||
# if BZ_UNIX
|
||||
IntNative retVal;
|
||||
struct MY_STAT statBuf;
|
||||
struct utimbuf uTimBuf;
|
||||
|
||||
retVal = MY_LSTAT ( srcName, &statBuf );
|
||||
ERROR_IF_NOT_ZERO ( retVal );
|
||||
uTimBuf.actime = statBuf.st_atime;
|
||||
uTimBuf.modtime = statBuf.st_mtime;
|
||||
uTimBuf.actime = fileMetaInfo.st_atime;
|
||||
uTimBuf.modtime = fileMetaInfo.st_mtime;
|
||||
|
||||
retVal = chmod ( dstName, statBuf.st_mode );
|
||||
retVal = chmod ( dstName, fileMetaInfo.st_mode );
|
||||
ERROR_IF_NOT_ZERO ( retVal );
|
||||
|
||||
retVal = utime ( dstName, &uTimBuf );
|
||||
ERROR_IF_NOT_ZERO ( retVal );
|
||||
|
||||
retVal = chown ( dstName, statBuf.st_uid, statBuf.st_gid );
|
||||
retVal = chown ( dstName, fileMetaInfo.st_uid, fileMetaInfo.st_gid );
|
||||
/* chown() will in many cases return with EPERM, which can
|
||||
be safely ignored.
|
||||
*/
|
||||
@ -1142,26 +1148,23 @@ void copyDatePermissionsAndOwner ( Char *srcName, Char *dstName )
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
static
|
||||
void setInterimPermissions ( Char *dstName )
|
||||
{
|
||||
#if BZ_UNIX
|
||||
IntNative retVal;
|
||||
retVal = chmod ( dstName, S_IRUSR | S_IWUSR );
|
||||
ERROR_IF_NOT_ZERO ( retVal );
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
static
|
||||
Bool containsDubiousChars ( Char* name )
|
||||
{
|
||||
Bool cdc = False;
|
||||
# if BZ_UNIX
|
||||
/* On unix, files can contain any characters and the file expansion
|
||||
* is performed by the shell.
|
||||
*/
|
||||
return False;
|
||||
# else /* ! BZ_UNIX */
|
||||
/* On non-unix (Win* platforms), wildcard characters are not allowed in
|
||||
* filenames.
|
||||
*/
|
||||
for (; *name != '\0'; name++)
|
||||
if (*name == '?' || *name == '*') cdc = True;
|
||||
return cdc;
|
||||
if (*name == '?' || *name == '*') return True;
|
||||
return False;
|
||||
# endif /* BZ_UNIX */
|
||||
}
|
||||
|
||||
|
||||
@ -1201,6 +1204,7 @@ void compress ( Char *name )
|
||||
FILE *inStr;
|
||||
FILE *outStr;
|
||||
Int32 n, i;
|
||||
struct MY_STAT statBuf;
|
||||
|
||||
deleteOutputOnInterrupt = False;
|
||||
|
||||
@ -1246,6 +1250,16 @@ void compress ( Char *name )
|
||||
return;
|
||||
}
|
||||
}
|
||||
if ( srcMode == SM_F2F || srcMode == SM_F2O ) {
|
||||
MY_STAT(inName, &statBuf);
|
||||
if ( MY_S_ISDIR(statBuf.st_mode) ) {
|
||||
fprintf( stderr,
|
||||
"%s: Input file %s is a directory.\n",
|
||||
progName,inName);
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
}
|
||||
if ( srcMode == SM_F2F && !forceOverwrite && notAStandardFile ( inName )) {
|
||||
if (noisy)
|
||||
fprintf ( stderr, "%s: Input file %s is not a normal file.\n",
|
||||
@ -1253,12 +1267,16 @@ void compress ( Char *name )
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) {
|
||||
if ( srcMode == SM_F2F && fileExists ( outName ) ) {
|
||||
if (forceOverwrite) {
|
||||
remove(outName);
|
||||
} else {
|
||||
fprintf ( stderr, "%s: Output file %s already exists.\n",
|
||||
progName, outName );
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
}
|
||||
if ( srcMode == SM_F2F && !forceOverwrite &&
|
||||
(n=countHardLinks ( inName )) > 0) {
|
||||
fprintf ( stderr, "%s: Input file %s has %d other link%s.\n",
|
||||
@ -1267,6 +1285,12 @@ void compress ( Char *name )
|
||||
return;
|
||||
}
|
||||
|
||||
if ( srcMode == SM_F2F ) {
|
||||
/* Save the file's meta-info before we open it. Doing it later
|
||||
means we mess up the access times. */
|
||||
saveInputFileMetaInfo ( inName );
|
||||
}
|
||||
|
||||
switch ( srcMode ) {
|
||||
|
||||
case SM_I2O:
|
||||
@ -1306,7 +1330,7 @@ void compress ( Char *name )
|
||||
|
||||
case SM_F2F:
|
||||
inStr = fopen ( inName, "rb" );
|
||||
outStr = fopen ( outName, "wb" );
|
||||
outStr = fopen_output_safely ( outName, "wb" );
|
||||
if ( outStr == NULL) {
|
||||
fprintf ( stderr, "%s: Can't create output file %s: %s.\n",
|
||||
progName, outName, strerror(errno) );
|
||||
@ -1321,7 +1345,6 @@ void compress ( Char *name )
|
||||
setExit(1);
|
||||
return;
|
||||
};
|
||||
setInterimPermissions ( outName );
|
||||
break;
|
||||
|
||||
default:
|
||||
@ -1343,7 +1366,7 @@ void compress ( Char *name )
|
||||
|
||||
/*--- If there was an I/O error, we won't get here. ---*/
|
||||
if ( srcMode == SM_F2F ) {
|
||||
copyDatePermissionsAndOwner ( inName, outName );
|
||||
applySavedMetaInfoToOutputFile ( outName );
|
||||
deleteOutputOnInterrupt = False;
|
||||
if ( !keepInputFiles ) {
|
||||
IntNative retVal = remove ( inName );
|
||||
@ -1364,6 +1387,7 @@ void uncompress ( Char *name )
|
||||
Int32 n, i;
|
||||
Bool magicNumberOK;
|
||||
Bool cantGuess;
|
||||
struct MY_STAT statBuf;
|
||||
|
||||
deleteOutputOnInterrupt = False;
|
||||
|
||||
@ -1405,6 +1429,16 @@ void uncompress ( Char *name )
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
if ( srcMode == SM_F2F || srcMode == SM_F2O ) {
|
||||
MY_STAT(inName, &statBuf);
|
||||
if ( MY_S_ISDIR(statBuf.st_mode) ) {
|
||||
fprintf( stderr,
|
||||
"%s: Input file %s is a directory.\n",
|
||||
progName,inName);
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
}
|
||||
if ( srcMode == SM_F2F && !forceOverwrite && notAStandardFile ( inName )) {
|
||||
if (noisy)
|
||||
fprintf ( stderr, "%s: Input file %s is not a normal file.\n",
|
||||
@ -1419,12 +1453,16 @@ void uncompress ( Char *name )
|
||||
progName, inName, outName );
|
||||
/* just a warning, no return */
|
||||
}
|
||||
if ( srcMode == SM_F2F && !forceOverwrite && fileExists ( outName ) ) {
|
||||
if ( srcMode == SM_F2F && fileExists ( outName ) ) {
|
||||
if (forceOverwrite) {
|
||||
remove(outName);
|
||||
} else {
|
||||
fprintf ( stderr, "%s: Output file %s already exists.\n",
|
||||
progName, outName );
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
}
|
||||
if ( srcMode == SM_F2F && !forceOverwrite &&
|
||||
(n=countHardLinks ( inName ) ) > 0) {
|
||||
fprintf ( stderr, "%s: Input file %s has %d other link%s.\n",
|
||||
@ -1433,6 +1471,12 @@ void uncompress ( Char *name )
|
||||
return;
|
||||
}
|
||||
|
||||
if ( srcMode == SM_F2F ) {
|
||||
/* Save the file's meta-info before we open it. Doing it later
|
||||
means we mess up the access times. */
|
||||
saveInputFileMetaInfo ( inName );
|
||||
}
|
||||
|
||||
switch ( srcMode ) {
|
||||
|
||||
case SM_I2O:
|
||||
@ -1463,7 +1507,7 @@ void uncompress ( Char *name )
|
||||
|
||||
case SM_F2F:
|
||||
inStr = fopen ( inName, "rb" );
|
||||
outStr = fopen ( outName, "wb" );
|
||||
outStr = fopen_output_safely ( outName, "wb" );
|
||||
if ( outStr == NULL) {
|
||||
fprintf ( stderr, "%s: Can't create output file %s: %s.\n",
|
||||
progName, outName, strerror(errno) );
|
||||
@ -1478,7 +1522,6 @@ void uncompress ( Char *name )
|
||||
setExit(1);
|
||||
return;
|
||||
};
|
||||
setInterimPermissions ( outName );
|
||||
break;
|
||||
|
||||
default:
|
||||
@ -1501,7 +1544,7 @@ void uncompress ( Char *name )
|
||||
/*--- If there was an I/O error, we won't get here. ---*/
|
||||
if ( magicNumberOK ) {
|
||||
if ( srcMode == SM_F2F ) {
|
||||
copyDatePermissionsAndOwner ( inName, outName );
|
||||
applySavedMetaInfoToOutputFile ( outName );
|
||||
deleteOutputOnInterrupt = False;
|
||||
if ( !keepInputFiles ) {
|
||||
IntNative retVal = remove ( inName );
|
||||
@ -1539,6 +1582,7 @@ void testf ( Char *name )
|
||||
{
|
||||
FILE *inStr;
|
||||
Bool allOK;
|
||||
struct MY_STAT statBuf;
|
||||
|
||||
deleteOutputOnInterrupt = False;
|
||||
|
||||
@ -1565,6 +1609,16 @@ void testf ( Char *name )
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
if ( srcMode != SM_I2O ) {
|
||||
MY_STAT(inName, &statBuf);
|
||||
if ( MY_S_ISDIR(statBuf.st_mode) ) {
|
||||
fprintf( stderr,
|
||||
"%s: Input file %s is a directory.\n",
|
||||
progName,inName);
|
||||
setExit(1);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
switch ( srcMode ) {
|
||||
|
||||
@ -1603,6 +1657,7 @@ void testf ( Char *name )
|
||||
}
|
||||
|
||||
/*--- Now the input handle is sane. Do the Biz. ---*/
|
||||
outputHandleJustInCase = NULL;
|
||||
allOK = testStream ( inStr );
|
||||
|
||||
if (allOK && verbosity >= 1) fprintf ( stderr, "ok\n" );
|
||||
@ -1619,7 +1674,7 @@ void license ( void )
|
||||
"bzip2, a block-sorting file compressor. "
|
||||
"Version %s.\n"
|
||||
" \n"
|
||||
" Copyright (C) 1996-2000 by Julian Seward.\n"
|
||||
" Copyright (C) 1996-2002 by Julian Seward.\n"
|
||||
" \n"
|
||||
" This program is free software; you can redistribute it and/or modify\n"
|
||||
" it under the terms set out in the LICENSE file, which is included\n"
|
||||
@ -1658,6 +1713,8 @@ void usage ( Char *fullProgName )
|
||||
" -V --version display software version & license\n"
|
||||
" -s --small use less memory (at most 2500k)\n"
|
||||
" -1 .. -9 set block size to 100k .. 900k\n"
|
||||
" --fast alias for -1\n"
|
||||
" --best alias for -9\n"
|
||||
"\n"
|
||||
" If invoked as `bzip2', default action is to compress.\n"
|
||||
" as `bunzip2', default action is to decompress.\n"
|
||||
@ -1933,6 +1990,8 @@ IntNative main ( IntNative argc, Char *argv[] )
|
||||
if (ISFLAG("--exponential")) workFactor = 1; else
|
||||
if (ISFLAG("--repetitive-best")) redundant(aa->name); else
|
||||
if (ISFLAG("--repetitive-fast")) redundant(aa->name); else
|
||||
if (ISFLAG("--fast")) blockSize100k = 1; else
|
||||
if (ISFLAG("--best")) blockSize100k = 9; else
|
||||
if (ISFLAG("--verbose")) verbosity++; else
|
||||
if (ISFLAG("--help")) { usage ( progName ); exit ( 0 ); }
|
||||
else
|
||||
|
132
bzip2.txt
132
bzip2.txt
@ -1,7 +1,6 @@
|
||||
|
||||
|
||||
NAME
|
||||
bzip2, bunzip2 - a block-sorting file compressor, v1.0
|
||||
bzip2, bunzip2 - a block-sorting file compressor, v1.0.2
|
||||
bzcat - decompresses files to stdout
|
||||
bzip2recover - recovers data from damaged bzip2 files
|
||||
|
||||
@ -18,20 +17,20 @@ DESCRIPTION
|
||||
sorting text compression algorithm, and Huffman coding.
|
||||
Compression is generally considerably better than that
|
||||
achieved by more conventional LZ77/LZ78-based compressors,
|
||||
and approaches the performance of the PPM family of sta-
|
||||
and approaches the performance of the PPM family of sta
|
||||
tistical compressors.
|
||||
|
||||
The command-line options are deliberately very similar to
|
||||
those of GNU gzip, but they are not identical.
|
||||
|
||||
bzip2 expects a list of file names to accompany the com-
|
||||
bzip2 expects a list of file names to accompany the com
|
||||
mand-line flags. Each file is replaced by a compressed
|
||||
version of itself, with the name "original_name.bz2".
|
||||
Each compressed file has the same modification date, per-
|
||||
missions, and, when possible, ownership as the correspond-
|
||||
Each compressed file has the same modification date, per
|
||||
missions, and, when possible, ownership as the correspond
|
||||
ing original, so that these properties can be correctly
|
||||
restored at decompression time. File name handling is
|
||||
naive in the sense that there is no mechanism for preserv-
|
||||
naive in the sense that there is no mechanism for preserv
|
||||
ing original file names, permissions, ownerships or dates
|
||||
in filesystems which lack these concepts, or have serious
|
||||
file name length restrictions, such as MS-DOS.
|
||||
@ -62,23 +61,23 @@ DESCRIPTION
|
||||
guess the name of the original file, and uses the original
|
||||
name with .out appended.
|
||||
|
||||
As with compression, supplying no filenames causes decom-
|
||||
As with compression, supplying no filenames causes decom
|
||||
pression from standard input to standard output.
|
||||
|
||||
bunzip2 will correctly decompress a file which is the con-
|
||||
bunzip2 will correctly decompress a file which is the con
|
||||
catenation of two or more compressed files. The result is
|
||||
the concatenation of the corresponding uncompressed files.
|
||||
Integrity testing (-t) of concatenated compressed files is
|
||||
also supported.
|
||||
|
||||
You can also compress or decompress files to the standard
|
||||
output by giving the -c flag. Multiple files may be com-
|
||||
output by giving the -c flag. Multiple files may be com
|
||||
pressed and decompressed like this. The resulting outputs
|
||||
are fed sequentially to stdout. Compression of multiple
|
||||
files in this manner generates a stream containing multi-
|
||||
files in this manner generates a stream containing multi
|
||||
ple compressed file representations. Such a stream can be
|
||||
decompressed correctly only by bzip2 version 0.9.0 or
|
||||
later. Earlier versions of bzip2 will stop after decom-
|
||||
later. Earlier versions of bzip2 will stop after decom
|
||||
pressing the first file in the stream.
|
||||
|
||||
bzcat (or bzip2 -dc) decompresses all specified files to
|
||||
@ -99,7 +98,7 @@ DESCRIPTION
|
||||
|
||||
As a self-check for your protection, bzip2 uses 32-bit
|
||||
CRCs to make sure that the decompressed version of a file
|
||||
is identical to the original. This guards against corrup-
|
||||
is identical to the original. This guards against corrup
|
||||
tion of the compressed data, and against undetected bugs
|
||||
in bzip2 (hopefully very unlikely). The chances of data
|
||||
corruption going undetected is microscopic, about one
|
||||
@ -127,8 +126,8 @@ OPTIONS
|
||||
and forces bzip2 to decompress.
|
||||
|
||||
-z --compress
|
||||
The complement to -d: forces compression, regard-
|
||||
less of the invokation name.
|
||||
The complement to -d: forces compression,
|
||||
regardless of the invocation name.
|
||||
|
||||
-t --test
|
||||
Check integrity of the specified file(s), but don't
|
||||
@ -141,6 +140,11 @@ OPTIONS
|
||||
forces bzip2 to break hard links to files, which it
|
||||
otherwise wouldn't do.
|
||||
|
||||
bzip2 normally declines to decompress files which
|
||||
don't have the correct magic header bytes. If
|
||||
forced (-f), however, it will pass such files
|
||||
through unmodified. This is how GNU gzip behaves.
|
||||
|
||||
-k --keep
|
||||
Keep (don't delete) input files during compression
|
||||
or decompression.
|
||||
@ -167,7 +171,7 @@ OPTIONS
|
||||
|
||||
-v --verbose
|
||||
Verbose mode -- show the compression ratio for each
|
||||
file processed. Further -v's increase the ver-
|
||||
file processed. Further -v's increase the ver
|
||||
bosity level, spewing out lots of information which
|
||||
is primarily of interest for diagnostic purposes.
|
||||
|
||||
@ -175,20 +179,24 @@ OPTIONS
|
||||
Display the software version, license terms and
|
||||
conditions.
|
||||
|
||||
-1 to -9
|
||||
-1 (or --fast) to -9 (or --best)
|
||||
Set the block size to 100 k, 200 k .. 900 k when
|
||||
compressing. Has no effect when decompressing.
|
||||
See MEMORY MANAGEMENT below.
|
||||
See MEMORY MANAGEMENT below. The --fast and --best
|
||||
aliases are primarily for GNU gzip compatibility.
|
||||
In particular, --fast doesn't make things signifi
|
||||
cantly faster. And --best merely selects the
|
||||
default behaviour.
|
||||
|
||||
-- Treats all subsequent arguments as file names, even
|
||||
if they start with a dash. This is so you can han-
|
||||
if they start with a dash. This is so you can han
|
||||
dle files with names beginning with a dash, for
|
||||
example: bzip2 -- -myfilename.
|
||||
|
||||
--repetitive-fast --repetitive-best
|
||||
These flags are redundant in versions 0.9.5 and
|
||||
above. They provided some coarse control over the
|
||||
behaviour of the sorting algorithm in earlier ver-
|
||||
behaviour of the sorting algorithm in earlier ver
|
||||
sions, which was sometimes useful. 0.9.5 and above
|
||||
have an improved algorithm which renders these
|
||||
flags irrelevant.
|
||||
@ -199,7 +207,7 @@ MEMORY MANAGEMENT
|
||||
affects both the compression ratio achieved, and the
|
||||
amount of memory needed for compression and decompression.
|
||||
The flags -1 through -9 specify the block size to be
|
||||
100,000 bytes through 900,000 bytes (the default) respec-
|
||||
100,000 bytes through 900,000 bytes (the default) respec
|
||||
tively. At decompression time, the block size used for
|
||||
compression is read from the header of the compressed
|
||||
file, and bunzip2 then allocates itself just enough memory
|
||||
@ -227,13 +235,13 @@ MEMORY MANAGEMENT
|
||||
bunzip2 will require about 3700 kbytes to decompress. To
|
||||
support decompression of any file on a 4 megabyte machine,
|
||||
bunzip2 has an option to decompress using approximately
|
||||
half this amount of memory, about 2300 kbytes. Decompres-
|
||||
half this amount of memory, about 2300 kbytes. Decompres
|
||||
sion speed is also halved, so you should use this option
|
||||
only where necessary. The relevant flag is -s.
|
||||
|
||||
In general, try and use the largest block size memory con-
|
||||
In general, try and use the largest block size memory con
|
||||
straints allow, since that maximises the compression
|
||||
achieved. Compression and decompression speed are virtu-
|
||||
achieved. Compression and decompression speed are virtu
|
||||
ally unaffected by block size.
|
||||
|
||||
Another significant point applies to files which fit in a
|
||||
@ -249,11 +257,11 @@ MEMORY MANAGEMENT
|
||||
|
||||
Here is a table which summarises the maximum memory usage
|
||||
for different block sizes. Also recorded is the total
|
||||
compressed size for 14 files of the Calgary Text Compres-
|
||||
compressed size for 14 files of the Calgary Text Compres
|
||||
sion Corpus totalling 3,141,622 bytes. This column gives
|
||||
some feel for how compression varies with block size.
|
||||
These figures tend to understate the advantage of larger
|
||||
block sizes for larger files, since the Corpus is domi-
|
||||
block sizes for larger files, since the Corpus is domi
|
||||
nated by smaller files.
|
||||
|
||||
Compress Decompress Decompress Corpus
|
||||
@ -272,7 +280,7 @@ MEMORY MANAGEMENT
|
||||
|
||||
RECOVERING DATA FROM DAMAGED FILES
|
||||
bzip2 compresses files in blocks, usually 900kbytes long.
|
||||
Each block is handled independently. If a media or trans-
|
||||
Each block is handled independently. If a media or trans
|
||||
mission error causes a multi-block .bz2 file to become
|
||||
damaged, it may be possible to recover data from the
|
||||
undamaged blocks in the file.
|
||||
@ -289,19 +297,19 @@ RECOVERING DATA FROM DAMAGED FILES
|
||||
the integrity of the resulting files, and decompress those
|
||||
which are undamaged.
|
||||
|
||||
bzip2recover takes a single argument, the name of the dam-
|
||||
aged file, and writes a number of files "rec0001file.bz2",
|
||||
"rec0002file.bz2", etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example, "bzip2
|
||||
-dc rec*file.bz2 > recovered_data" -- lists the files in
|
||||
the correct order.
|
||||
bzip2recover takes a single argument, the name of the dam
|
||||
aged file, and writes a number of files
|
||||
"rec00001file.bz2", "rec00002file.bz2", etc, containing
|
||||
the extracted blocks. The output filenames are
|
||||
designed so that the use of wildcards in subsequent pro
|
||||
cessing -- for example, "bzip2 -dc rec*file.bz2 > recov
|
||||
ered_data" -- processes the files in the correct order.
|
||||
|
||||
bzip2recover should be of most use dealing with large .bz2
|
||||
files, as these will contain many blocks. It is clearly
|
||||
futile to use it on damaged single-block files, since a
|
||||
damaged block cannot be recovered. If you wish to min-
|
||||
imise any potential data loss through media or transmis-
|
||||
damaged block cannot be recovered. If you wish to min
|
||||
imise any potential data loss through media or transmis
|
||||
sion errors, you might consider compressing with a smaller
|
||||
block size.
|
||||
|
||||
@ -315,19 +323,19 @@ PERFORMANCE NOTES
|
||||
better than previous versions in this respect. The ratio
|
||||
between worst-case and average-case compression time is in
|
||||
the region of 10:1. For previous versions, this figure
|
||||
was more like 100:1. You can use the -vvvv option to mon-
|
||||
was more like 100:1. You can use the -vvvv option to mon
|
||||
itor progress in great detail, if you want.
|
||||
|
||||
Decompression speed is unaffected by these phenomena.
|
||||
|
||||
bzip2 usually allocates several megabytes of memory to
|
||||
operate in, and then charges all over it in a fairly ran-
|
||||
dom fashion. This means that performance, both for com-
|
||||
operate in, and then charges all over it in a fairly ran
|
||||
dom fashion. This means that performance, both for com
|
||||
pressing and decompressing, is largely determined by the
|
||||
speed at which your machine can service cache misses.
|
||||
Because of this, small changes to the code to reduce the
|
||||
miss rate have been observed to give disproportionately
|
||||
large performance improvements. I imagine bzip2 will per-
|
||||
large performance improvements. I imagine bzip2 will per
|
||||
form best on machines with very large caches.
|
||||
|
||||
|
||||
@ -337,40 +345,46 @@ CAVEATS
|
||||
but the details of what the problem is sometimes seem
|
||||
rather misleading.
|
||||
|
||||
This manual page pertains to version 1.0 of bzip2. Com-
|
||||
This manual page pertains to version 1.0.2 of bzip2. Com
|
||||
pressed data created by this version is entirely forwards
|
||||
and backwards compatible with the previous public
|
||||
releases, versions 0.1pl2, 0.9.0 and 0.9.5, but with the
|
||||
following exception: 0.9.0 and above can correctly decom-
|
||||
press multiple concatenated compressed files. 0.1pl2 can-
|
||||
not do this; it will stop after decompressing just the
|
||||
first file in the stream.
|
||||
releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
|
||||
but with the following exception: 0.9.0 and above can cor
|
||||
rectly decompress multiple concatenated compressed files.
|
||||
0.1pl2 cannot do this; it will stop after decompressing
|
||||
just the first file in the stream.
|
||||
|
||||
bzip2recover uses 32-bit integers to represent bit posi-
|
||||
tions in compressed files, so it cannot handle compressed
|
||||
files more than 512 megabytes long. This could easily be
|
||||
fixed.
|
||||
bzip2recover versions prior to this one, 1.0.2, used
|
||||
32-bit integers to represent bit positions in compressed
|
||||
files, so it could not handle compressed files more than
|
||||
512 megabytes long. Version 1.0.2 and above uses 64-bit
|
||||
ints on some platforms which support them (GNU supported
|
||||
targets, and Windows). To establish whether or not
|
||||
bzip2recover was built with such a limitation, run it
|
||||
without arguments. In any event you can build yourself an
|
||||
unlimited version if you can recompile it with MaybeUInt64
|
||||
set to be an unsigned 64-bit integer.
|
||||
|
||||
|
||||
AUTHOR
|
||||
Julian Seward, jseward@acm.org.
|
||||
|
||||
http://sourceware.cygnus.com/bzip2
|
||||
http://www.muraroa.demon.co.uk
|
||||
http://sources.redhat.com/bzip2
|
||||
|
||||
The ideas embodied in bzip2 are due to (at least) the fol-
|
||||
The ideas embodied in bzip2 are due to (at least) the fol
|
||||
lowing people: Michael Burrows and David Wheeler (for the
|
||||
block sorting transformation), David Wheeler (again, for
|
||||
the Huffman coder), Peter Fenwick (for the structured cod-
|
||||
the Huffman coder), Peter Fenwick (for the structured cod
|
||||
ing model in the original bzip, and many refinements), and
|
||||
Alistair Moffat, Radford Neal and Ian Witten (for the
|
||||
arithmetic coder in the original bzip). I am much
|
||||
indebted for their help, support and advice. See the man-
|
||||
indebted for their help, support and advice. See the man
|
||||
ual in the source distribution for pointers to sources of
|
||||
documentation. Christian von Roques encouraged me to look
|
||||
for faster sorting algorithms, so as to speed up compres-
|
||||
for faster sorting algorithms, so as to speed up compres
|
||||
sion. Bela Lubkin encouraged me to improve the worst-case
|
||||
compression performance. Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and
|
||||
were generally helpful.
|
||||
compression performance. The bz* scripts are derived from
|
||||
those of GNU gzip. Many people sent patches, helped with
|
||||
portability problems, lent machines, gave advice and were
|
||||
generally helpful.
|
||||
|
||||
|
161
bzip2recover.c
161
bzip2recover.c
@ -9,7 +9,7 @@
|
||||
salvage from damaged files created by the accompanying
|
||||
bzip2-1.0 program.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -57,6 +57,29 @@
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
|
||||
/* This program records bit locations in the file to be recovered.
|
||||
That means that if 64-bit ints are not supported, we will not
|
||||
be able to recover .bz2 files over 512MB (2^32 bits) long.
|
||||
On GNU supported platforms, we take advantage of the 64-bit
|
||||
int support to circumvent this problem. Ditto MSVC.
|
||||
|
||||
This change occurred in version 1.0.2; all prior versions have
|
||||
the 512MB limitation.
|
||||
*/
|
||||
#ifdef __GNUC__
|
||||
typedef unsigned long long int MaybeUInt64;
|
||||
# define MaybeUInt64_FMT "%Lu"
|
||||
#else
|
||||
#ifdef _MSC_VER
|
||||
typedef unsigned __int64 MaybeUInt64;
|
||||
# define MaybeUInt64_FMT "%I64u"
|
||||
#else
|
||||
typedef unsigned int MaybeUInt64;
|
||||
# define MaybeUInt64_FMT "%u"
|
||||
#endif
|
||||
#endif
|
||||
|
||||
typedef unsigned int UInt32;
|
||||
typedef int Int32;
|
||||
typedef unsigned char UChar;
|
||||
@ -66,12 +89,24 @@ typedef unsigned char Bool;
|
||||
#define False ((Bool)0)
|
||||
|
||||
|
||||
Char inFileName[2000];
|
||||
Char outFileName[2000];
|
||||
Char progName[2000];
|
||||
#define BZ_MAX_FILENAME 2000
|
||||
|
||||
UInt32 bytesOut = 0;
|
||||
UInt32 bytesIn = 0;
|
||||
Char inFileName[BZ_MAX_FILENAME];
|
||||
Char outFileName[BZ_MAX_FILENAME];
|
||||
Char progName[BZ_MAX_FILENAME];
|
||||
|
||||
MaybeUInt64 bytesOut = 0;
|
||||
MaybeUInt64 bytesIn = 0;
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
/*--- Header bytes ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
#define BZ_HDR_B 0x42 /* 'B' */
|
||||
#define BZ_HDR_Z 0x5a /* 'Z' */
|
||||
#define BZ_HDR_h 0x68 /* 'h' */
|
||||
#define BZ_HDR_0 0x30 /* '0' */
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
@ -116,6 +151,23 @@ void mallocFail ( Int32 n )
|
||||
}
|
||||
|
||||
|
||||
/*---------------------------------------------*/
|
||||
void tooManyBlocks ( Int32 max_handled_blocks )
|
||||
{
|
||||
fprintf ( stderr,
|
||||
"%s: `%s' appears to contain more than %d blocks\n",
|
||||
progName, inFileName, max_handled_blocks );
|
||||
fprintf ( stderr,
|
||||
"%s: and cannot be handled. To fix, increase\n",
|
||||
progName );
|
||||
fprintf ( stderr,
|
||||
"%s: BZ_MAX_HANDLED_BLOCKS in bzip2recover.c, and recompile.\n",
|
||||
progName );
|
||||
exit ( 1 );
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*---------------------------------------------------*/
|
||||
/*--- Bit stream I/O ---*/
|
||||
/*---------------------------------------------------*/
|
||||
@ -254,27 +306,37 @@ Bool endsInBz2 ( Char* name )
|
||||
/*--- ---*/
|
||||
/*---------------------------------------------------*/
|
||||
|
||||
/* This logic isn't really right when it comes to Cygwin. */
|
||||
#ifdef _WIN32
|
||||
# define BZ_SPLIT_SYM '\\' /* path splitter on Windows platform */
|
||||
#else
|
||||
# define BZ_SPLIT_SYM '/' /* path splitter on Unix platform */
|
||||
#endif
|
||||
|
||||
#define BLOCK_HEADER_HI 0x00003141UL
|
||||
#define BLOCK_HEADER_LO 0x59265359UL
|
||||
|
||||
#define BLOCK_ENDMARK_HI 0x00001772UL
|
||||
#define BLOCK_ENDMARK_LO 0x45385090UL
|
||||
|
||||
/* Increase if necessary. However, a .bz2 file with > 50000 blocks
|
||||
would have an uncompressed size of at least 40GB, so the chances
|
||||
are low you'll need to up this.
|
||||
*/
|
||||
#define BZ_MAX_HANDLED_BLOCKS 50000
|
||||
|
||||
UInt32 bStart[20000];
|
||||
UInt32 bEnd[20000];
|
||||
UInt32 rbStart[20000];
|
||||
UInt32 rbEnd[20000];
|
||||
MaybeUInt64 bStart [BZ_MAX_HANDLED_BLOCKS];
|
||||
MaybeUInt64 bEnd [BZ_MAX_HANDLED_BLOCKS];
|
||||
MaybeUInt64 rbStart[BZ_MAX_HANDLED_BLOCKS];
|
||||
MaybeUInt64 rbEnd [BZ_MAX_HANDLED_BLOCKS];
|
||||
|
||||
Int32 main ( Int32 argc, Char** argv )
|
||||
{
|
||||
FILE* inFile;
|
||||
FILE* outFile;
|
||||
BitStream* bsIn, *bsWr;
|
||||
Int32 currBlock, b, wrBlock;
|
||||
UInt32 bitsRead;
|
||||
Int32 rbCtr;
|
||||
|
||||
Int32 b, wrBlock, currBlock, rbCtr;
|
||||
MaybeUInt64 bitsRead;
|
||||
|
||||
UInt32 buffHi, buffLo, blockCRC;
|
||||
Char* p;
|
||||
@ -282,11 +344,37 @@ Int32 main ( Int32 argc, Char** argv )
|
||||
strcpy ( progName, argv[0] );
|
||||
inFileName[0] = outFileName[0] = 0;
|
||||
|
||||
fprintf ( stderr, "bzip2recover 1.0: extracts blocks from damaged .bz2 files.\n" );
|
||||
fprintf ( stderr,
|
||||
"bzip2recover 1.0.2: extracts blocks from damaged .bz2 files.\n" );
|
||||
|
||||
if (argc != 2) {
|
||||
fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n",
|
||||
progName, progName );
|
||||
switch (sizeof(MaybeUInt64)) {
|
||||
case 8:
|
||||
fprintf(stderr,
|
||||
"\trestrictions on size of recovered file: None\n");
|
||||
break;
|
||||
case 4:
|
||||
fprintf(stderr,
|
||||
"\trestrictions on size of recovered file: 512 MB\n");
|
||||
fprintf(stderr,
|
||||
"\tto circumvent, recompile with MaybeUInt64 as an\n"
|
||||
"\tunsigned 64-bit int.\n");
|
||||
break;
|
||||
default:
|
||||
fprintf(stderr,
|
||||
"\tsizeof(MaybeUInt64) is not 4 or 8 -- "
|
||||
"configuration error.\n");
|
||||
break;
|
||||
}
|
||||
exit(1);
|
||||
}
|
||||
|
||||
if (strlen(argv[1]) >= BZ_MAX_FILENAME-20) {
|
||||
fprintf ( stderr,
|
||||
"%s: supplied filename is suspiciously (>= %d chars) long. Bye!\n",
|
||||
progName, strlen(argv[1]) );
|
||||
exit(1);
|
||||
}
|
||||
|
||||
@ -316,7 +404,8 @@ Int32 main ( Int32 argc, Char** argv )
|
||||
(bitsRead - bStart[currBlock]) >= 40) {
|
||||
bEnd[currBlock] = bitsRead-1;
|
||||
if (currBlock > 0)
|
||||
fprintf ( stderr, " block %d runs from %d to %d (incomplete)\n",
|
||||
fprintf ( stderr, " block %d runs from " MaybeUInt64_FMT
|
||||
" to " MaybeUInt64_FMT " (incomplete)\n",
|
||||
currBlock, bStart[currBlock], bEnd[currBlock] );
|
||||
} else
|
||||
currBlock--;
|
||||
@ -330,17 +419,22 @@ Int32 main ( Int32 argc, Char** argv )
|
||||
( (buffHi & 0x0000ffff) == BLOCK_ENDMARK_HI
|
||||
&& buffLo == BLOCK_ENDMARK_LO)
|
||||
) {
|
||||
if (bitsRead > 49)
|
||||
bEnd[currBlock] = bitsRead-49; else
|
||||
if (bitsRead > 49) {
|
||||
bEnd[currBlock] = bitsRead-49;
|
||||
} else {
|
||||
bEnd[currBlock] = 0;
|
||||
}
|
||||
if (currBlock > 0 &&
|
||||
(bEnd[currBlock] - bStart[currBlock]) >= 130) {
|
||||
fprintf ( stderr, " block %d runs from %d to %d\n",
|
||||
fprintf ( stderr, " block %d runs from " MaybeUInt64_FMT
|
||||
" to " MaybeUInt64_FMT "\n",
|
||||
rbCtr+1, bStart[currBlock], bEnd[currBlock] );
|
||||
rbStart[rbCtr] = bStart[currBlock];
|
||||
rbEnd[rbCtr] = bEnd[currBlock];
|
||||
rbCtr++;
|
||||
}
|
||||
if (currBlock >= BZ_MAX_HANDLED_BLOCKS)
|
||||
tooManyBlocks(BZ_MAX_HANDLED_BLOCKS);
|
||||
currBlock++;
|
||||
|
||||
bStart[currBlock] = bitsRead;
|
||||
@ -400,10 +494,25 @@ Int32 main ( Int32 argc, Char** argv )
|
||||
wrBlock++;
|
||||
} else
|
||||
if (bitsRead == rbStart[wrBlock]) {
|
||||
outFileName[0] = 0;
|
||||
sprintf ( outFileName, "rec%4d", wrBlock+1 );
|
||||
for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0';
|
||||
strcat ( outFileName, inFileName );
|
||||
/* Create the output file name, correctly handling leading paths.
|
||||
(31.10.2001 by Sergey E. Kusikov) */
|
||||
Char* split;
|
||||
Int32 ofs, k;
|
||||
for (k = 0; k < BZ_MAX_FILENAME; k++)
|
||||
outFileName[k] = 0;
|
||||
strcpy (outFileName, inFileName);
|
||||
split = strrchr (outFileName, BZ_SPLIT_SYM);
|
||||
if (split == NULL) {
|
||||
split = outFileName;
|
||||
} else {
|
||||
++split;
|
||||
}
|
||||
/* Now split points to the start of the basename. */
|
||||
ofs = split - outFileName;
|
||||
sprintf (split, "rec%5d", wrBlock+1);
|
||||
for (p = split; *p != 0; p++) if (*p == ' ') *p = '0';
|
||||
strcat (outFileName, inFileName + ofs);
|
||||
|
||||
if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" );
|
||||
|
||||
fprintf ( stderr, " writing block %d to `%s' ...\n",
|
||||
@ -416,8 +525,10 @@ Int32 main ( Int32 argc, Char** argv )
|
||||
exit(1);
|
||||
}
|
||||
bsWr = bsOpenWriteStream ( outFile );
|
||||
bsPutUChar ( bsWr, 'B' ); bsPutUChar ( bsWr, 'Z' );
|
||||
bsPutUChar ( bsWr, 'h' ); bsPutUChar ( bsWr, '9' );
|
||||
bsPutUChar ( bsWr, BZ_HDR_B );
|
||||
bsPutUChar ( bsWr, BZ_HDR_Z );
|
||||
bsPutUChar ( bsWr, BZ_HDR_h );
|
||||
bsPutUChar ( bsWr, BZ_HDR_0 + 9 );
|
||||
bsPutUChar ( bsWr, 0x31 ); bsPutUChar ( bsWr, 0x41 );
|
||||
bsPutUChar ( bsWr, 0x59 ); bsPutUChar ( bsWr, 0x26 );
|
||||
bsPutUChar ( bsWr, 0x53 ); bsPutUChar ( bsWr, 0x59 );
|
||||
|
35
bzlib.c
35
bzlib.c
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -93,10 +93,39 @@ void BZ2_bz__AssertH__fail ( int errcode )
|
||||
"component, you should also report this bug to the author(s)\n"
|
||||
"of that program. Please make an effort to report this bug;\n"
|
||||
"timely and accurate bug reports eventually lead to higher\n"
|
||||
"quality software. Thanks. Julian Seward, 21 March 2000.\n\n",
|
||||
"quality software. Thanks. Julian Seward, 30 December 2001.\n\n",
|
||||
errcode,
|
||||
BZ2_bzlibVersion()
|
||||
);
|
||||
|
||||
if (errcode == 1007) {
|
||||
fprintf(stderr,
|
||||
"\n*** A special note about internal error number 1007 ***\n"
|
||||
"\n"
|
||||
"Experience suggests that a common cause of i.e. 1007\n"
|
||||
"is unreliable memory or other hardware. The 1007 assertion\n"
|
||||
"just happens to cross-check the results of huge numbers of\n"
|
||||
"memory reads/writes, and so acts (unintendedly) as a stress\n"
|
||||
"test of your memory system.\n"
|
||||
"\n"
|
||||
"I suggest the following: try compressing the file again,\n"
|
||||
"possibly monitoring progress in detail with the -vv flag.\n"
|
||||
"\n"
|
||||
"* If the error cannot be reproduced, and/or happens at different\n"
|
||||
" points in compression, you may have a flaky memory system.\n"
|
||||
" Try a memory-test program. I have used Memtest86\n"
|
||||
" (www.memtest86.com). At the time of writing it is free (GPLd).\n"
|
||||
" Memtest86 tests memory much more thorougly than your BIOSs\n"
|
||||
" power-on test, and may find failures that the BIOS doesn't.\n"
|
||||
"\n"
|
||||
"* If the error can be repeatably reproduced, this is a bug in\n"
|
||||
" bzip2, and I would very much like to hear about it. Please\n"
|
||||
" let me know, and, ideally, save a copy of the file causing the\n"
|
||||
" problem -- without which I will be unable to investigate it.\n"
|
||||
"\n"
|
||||
);
|
||||
}
|
||||
|
||||
exit(3);
|
||||
}
|
||||
#endif
|
||||
@ -1402,7 +1431,7 @@ BZFILE * bzopen_or_bzdopen
|
||||
smallMode = 1; break;
|
||||
default:
|
||||
if (isdigit((int)(*mode))) {
|
||||
blockSize100k = *mode-'0';
|
||||
blockSize100k = *mode-BZ_HDR_0;
|
||||
}
|
||||
}
|
||||
mode++;
|
||||
|
6
bzlib.h
6
bzlib.h
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -110,8 +110,10 @@ typedef
|
||||
#define BZ_EXPORT
|
||||
#endif
|
||||
|
||||
#ifdef _WIN32
|
||||
/* Need a definitition for FILE */
|
||||
#include <stdio.h>
|
||||
|
||||
#ifdef _WIN32
|
||||
# include <windows.h>
|
||||
# ifdef small
|
||||
/* windows.h define small to char */
|
||||
|
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -76,7 +76,7 @@
|
||||
|
||||
/*-- General stuff. --*/
|
||||
|
||||
#define BZ_VERSION "1.0.1, 23-June-2000"
|
||||
#define BZ_VERSION "1.0.2, 30-Dec-2001"
|
||||
|
||||
typedef char Char;
|
||||
typedef unsigned char Bool;
|
||||
@ -137,6 +137,13 @@ extern void bz_internal_error ( int errcode );
|
||||
#define BZFREE(ppp) (strm->bzfree)(strm->opaque,(ppp))
|
||||
|
||||
|
||||
/*-- Header bytes. --*/
|
||||
|
||||
#define BZ_HDR_B 0x42 /* 'B' */
|
||||
#define BZ_HDR_Z 0x5a /* 'Z' */
|
||||
#define BZ_HDR_h 0x68 /* 'h' */
|
||||
#define BZ_HDR_0 0x30 /* '0' */
|
||||
|
||||
/*-- Constants for the back end. --*/
|
||||
|
||||
#define BZ_MAX_ALPHA_SIZE 258
|
||||
|
61
bzmore
Normal file
61
bzmore
Normal file
@ -0,0 +1,61 @@
|
||||
#!/bin/sh
|
||||
|
||||
# Bzmore wrapped for bzip2,
|
||||
# adapted from zmore by Philippe Troin <phil@fifi.org> for Debian GNU/Linux.
|
||||
|
||||
PATH="/usr/bin:$PATH"; export PATH
|
||||
|
||||
prog=`echo $0 | sed 's|.*/||'`
|
||||
case "$prog" in
|
||||
*less) more=less ;;
|
||||
*) more=more ;;
|
||||
esac
|
||||
|
||||
if test "`echo -n a`" = "-n a"; then
|
||||
# looks like a SysV system:
|
||||
n1=''; n2='\c'
|
||||
else
|
||||
n1='-n'; n2=''
|
||||
fi
|
||||
oldtty=`stty -g 2>/dev/null`
|
||||
if stty -cbreak 2>/dev/null; then
|
||||
cb='cbreak'; ncb='-cbreak'
|
||||
else
|
||||
# 'stty min 1' resets eof to ^a on both SunOS and SysV!
|
||||
cb='min 1 -icanon'; ncb='icanon eof ^d'
|
||||
fi
|
||||
if test $? -eq 0 -a -n "$oldtty"; then
|
||||
trap 'stty $oldtty 2>/dev/null; exit' 0 2 3 5 10 13 15
|
||||
else
|
||||
trap 'stty $ncb echo 2>/dev/null; exit' 0 2 3 5 10 13 15
|
||||
fi
|
||||
|
||||
if test $# = 0; then
|
||||
if test -t 0; then
|
||||
echo usage: $prog files...
|
||||
else
|
||||
bzip2 -cdfq | eval $more
|
||||
fi
|
||||
else
|
||||
FIRST=1
|
||||
for FILE
|
||||
do
|
||||
if test $FIRST -eq 0; then
|
||||
echo $n1 "--More--(Next file: $FILE)$n2"
|
||||
stty $cb -echo 2>/dev/null
|
||||
ANS=`dd bs=1 count=1 2>/dev/null`
|
||||
stty $ncb echo 2>/dev/null
|
||||
echo " "
|
||||
if test "$ANS" = 'e' -o "$ANS" = 'q'; then
|
||||
exit
|
||||
fi
|
||||
fi
|
||||
if test "$ANS" != 's'; then
|
||||
echo "------> $FILE <------"
|
||||
bzip2 -cdfq "$FILE" | eval $more
|
||||
fi
|
||||
if test -t; then
|
||||
FIRST=0
|
||||
fi
|
||||
done
|
||||
fi
|
152
bzmore.1
Normal file
152
bzmore.1
Normal file
@ -0,0 +1,152 @@
|
||||
.\"Shamelessly copied from zmore.1 by Philippe Troin <phil@fifi.org>
|
||||
.\"for Debian GNU/Linux
|
||||
.TH BZMORE 1
|
||||
.SH NAME
|
||||
bzmore, bzless \- file perusal filter for crt viewing of bzip2 compressed text
|
||||
.SH SYNOPSIS
|
||||
.B bzmore
|
||||
[ name ... ]
|
||||
.br
|
||||
.B bzless
|
||||
[ name ... ]
|
||||
.SH NOTE
|
||||
In the following description,
|
||||
.I bzless
|
||||
and
|
||||
.I less
|
||||
can be used interchangeably with
|
||||
.I bzmore
|
||||
and
|
||||
.I more.
|
||||
.SH DESCRIPTION
|
||||
.I Bzmore
|
||||
is a filter which allows examination of compressed or plain text files
|
||||
one screenful at a time on a soft-copy terminal.
|
||||
.I bzmore
|
||||
works on files compressed with
|
||||
.I bzip2
|
||||
and also on uncompressed files.
|
||||
If a file does not exist,
|
||||
.I bzmore
|
||||
looks for a file of the same name with the addition of a .bz2 suffix.
|
||||
.PP
|
||||
.I Bzmore
|
||||
normally pauses after each screenful, printing --More--
|
||||
at the bottom of the screen.
|
||||
If the user then types a carriage return, one more line is displayed.
|
||||
If the user hits a space,
|
||||
another screenful is displayed. Other possibilities are enumerated later.
|
||||
.PP
|
||||
.I Bzmore
|
||||
looks in the file
|
||||
.I /etc/termcap
|
||||
to determine terminal characteristics,
|
||||
and to determine the default window size.
|
||||
On a terminal capable of displaying 24 lines,
|
||||
the default window size is 22 lines.
|
||||
Other sequences which may be typed when
|
||||
.I bzmore
|
||||
pauses, and their effects, are as follows (\fIi\fP is an optional integer
|
||||
argument, defaulting to 1) :
|
||||
.PP
|
||||
.IP \fIi\|\fP<space>
|
||||
display
|
||||
.I i
|
||||
more lines, (or another screenful if no argument is given)
|
||||
.PP
|
||||
.IP ^D
|
||||
display 11 more lines (a ``scroll'').
|
||||
If
|
||||
.I i
|
||||
is given, then the scroll size is set to \fIi\|\fP.
|
||||
.PP
|
||||
.IP d
|
||||
same as ^D (control-D)
|
||||
.PP
|
||||
.IP \fIi\|\fPz
|
||||
same as typing a space except that \fIi\|\fP, if present, becomes the new
|
||||
window size. Note that the window size reverts back to the default at the
|
||||
end of the current file.
|
||||
.PP
|
||||
.IP \fIi\|\fPs
|
||||
skip \fIi\|\fP lines and print a screenful of lines
|
||||
.PP
|
||||
.IP \fIi\|\fPf
|
||||
skip \fIi\fP screenfuls and print a screenful of lines
|
||||
.PP
|
||||
.IP "q or Q"
|
||||
quit reading the current file; go on to the next (if any)
|
||||
.PP
|
||||
.IP "e or q"
|
||||
When the prompt --More--(Next file:
|
||||
.IR file )
|
||||
is printed, this command causes bzmore to exit.
|
||||
.PP
|
||||
.IP s
|
||||
When the prompt --More--(Next file:
|
||||
.IR file )
|
||||
is printed, this command causes bzmore to skip the next file and continue.
|
||||
.PP
|
||||
.IP =
|
||||
Display the current line number.
|
||||
.PP
|
||||
.IP \fIi\|\fP/expr
|
||||
search for the \fIi\|\fP-th occurrence of the regular expression \fIexpr.\fP
|
||||
If the pattern is not found,
|
||||
.I bzmore
|
||||
goes on to the next file (if any).
|
||||
Otherwise, a screenful is displayed, starting two lines before the place
|
||||
where the expression was found.
|
||||
The user's erase and kill characters may be used to edit the regular
|
||||
expression.
|
||||
Erasing back past the first column cancels the search command.
|
||||
.PP
|
||||
.IP \fIi\|\fPn
|
||||
search for the \fIi\|\fP-th occurrence of the last regular expression entered.
|
||||
.PP
|
||||
.IP !command
|
||||
invoke a shell with \fIcommand\|\fP.
|
||||
The character `!' in "command" are replaced with the
|
||||
previous shell command. The sequence "\\!" is replaced by "!".
|
||||
.PP
|
||||
.IP ":q or :Q"
|
||||
quit reading the current file; go on to the next (if any)
|
||||
(same as q or Q).
|
||||
.PP
|
||||
.IP .
|
||||
(dot) repeat the previous command.
|
||||
.PP
|
||||
The commands take effect immediately, i.e., it is not necessary to
|
||||
type a carriage return.
|
||||
Up to the time when the command character itself is given,
|
||||
the user may hit the line kill character to cancel the numerical
|
||||
argument being formed.
|
||||
In addition, the user may hit the erase character to redisplay the
|
||||
--More-- message.
|
||||
.PP
|
||||
At any time when output is being sent to the terminal, the user can
|
||||
hit the quit key (normally control\-\\).
|
||||
.I Bzmore
|
||||
will stop sending output, and will display the usual --More--
|
||||
prompt.
|
||||
The user may then enter one of the above commands in the normal manner.
|
||||
Unfortunately, some output is lost when this is done, due to the
|
||||
fact that any characters waiting in the terminal's output queue
|
||||
are flushed when the quit signal occurs.
|
||||
.PP
|
||||
The terminal is set to
|
||||
.I noecho
|
||||
mode by this program so that the output can be continuous.
|
||||
What you type will thus not show on your terminal, except for the / and !
|
||||
commands.
|
||||
.PP
|
||||
If the standard output is not a teletype, then
|
||||
.I bzmore
|
||||
acts just like
|
||||
.I bzcat,
|
||||
except that a header is printed before each file.
|
||||
.SH FILES
|
||||
.DT
|
||||
/etc/termcap Terminal data base
|
||||
.SH "SEE ALSO"
|
||||
more(1), less(1), bzip2(1), bzdiff(1), bzgrep(1)
|
10
compress.c
10
compress.c
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -663,10 +663,10 @@ void BZ2_compressBlock ( EState* s, Bool is_last_block )
|
||||
/*-- If this is the first block, create the stream header. --*/
|
||||
if (s->blockNo == 1) {
|
||||
BZ2_bsInitWrite ( s );
|
||||
bsPutUChar ( s, 'B' );
|
||||
bsPutUChar ( s, 'Z' );
|
||||
bsPutUChar ( s, 'h' );
|
||||
bsPutUChar ( s, (UChar)('0' + s->blockSize100k) );
|
||||
bsPutUChar ( s, BZ_HDR_B );
|
||||
bsPutUChar ( s, BZ_HDR_Z );
|
||||
bsPutUChar ( s, BZ_HDR_h );
|
||||
bsPutUChar ( s, (UChar)(BZ_HDR_0 + s->blockSize100k) );
|
||||
}
|
||||
|
||||
if (s->nblock > 0) {
|
||||
|
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
|
14
decompress.c
14
decompress.c
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -235,18 +235,18 @@ Int32 BZ2_decompress ( DState* s )
|
||||
switch (s->state) {
|
||||
|
||||
GET_UCHAR(BZ_X_MAGIC_1, uc);
|
||||
if (uc != 'B') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
if (uc != BZ_HDR_B) RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
|
||||
GET_UCHAR(BZ_X_MAGIC_2, uc);
|
||||
if (uc != 'Z') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
if (uc != BZ_HDR_Z) RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
|
||||
GET_UCHAR(BZ_X_MAGIC_3, uc)
|
||||
if (uc != 'h') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
if (uc != BZ_HDR_h) RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
|
||||
GET_BITS(BZ_X_MAGIC_4, s->blockSize100k, 8)
|
||||
if (s->blockSize100k < '1' ||
|
||||
s->blockSize100k > '9') RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
s->blockSize100k -= '0';
|
||||
if (s->blockSize100k < (BZ_HDR_0 + 1) ||
|
||||
s->blockSize100k > (BZ_HDR_0 + 9)) RETURN(BZ_DATA_ERROR_MAGIC);
|
||||
s->blockSize100k -= BZ_HDR_0;
|
||||
|
||||
if (s->smallDecompress) {
|
||||
s->ll16 = BZALLOC( s->blockSize100k * 100000 * sizeof(UInt16) );
|
||||
|
@ -19,7 +19,7 @@
|
||||
|
||||
#ifdef _WIN32
|
||||
|
||||
#define BZ2_LIBNAME "libbz2-1.0.0.DLL"
|
||||
#define BZ2_LIBNAME "libbz2-1.0.2.DLL"
|
||||
|
||||
#include <windows.h>
|
||||
static int BZ2DLLLoaded = 0;
|
||||
@ -130,8 +130,8 @@ int main(int argc,char *argv[])
|
||||
}else{
|
||||
fp_w = stdout;
|
||||
}
|
||||
if((BZ2fp_r == NULL && (BZ2fp_r = BZ2_bzdopen(fileno(stdin),"rb"))==NULL)
|
||||
|| (BZ2fp_r != NULL && (BZ2fp_r = BZ2_bzopen(fn_r,"rb"))==NULL)){
|
||||
if((fn_r == NULL && (BZ2fp_r = BZ2_bzdopen(fileno(stdin),"rb"))==NULL)
|
||||
|| (fn_r != NULL && (BZ2fp_r = BZ2_bzopen(fn_r,"rb"))==NULL)){
|
||||
printf("can't bz2openstream\n");
|
||||
exit(1);
|
||||
}
|
||||
|
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
|
@ -4,7 +4,7 @@
|
||||
# Fixed up by JRS for bzip2-0.9.5d release.
|
||||
|
||||
CC=cl
|
||||
CFLAGS= -DWIN32 -MD -Ox -D_FILE_OFFSET_BITS=64
|
||||
CFLAGS= -DWIN32 -MD -Ox -D_FILE_OFFSET_BITS=64 -nologo
|
||||
|
||||
OBJS= blocksort.obj \
|
||||
huffman.obj \
|
||||
|
114
manual.texi
114
manual.texi
@ -2,10 +2,10 @@
|
||||
@setfilename bzip2.info
|
||||
|
||||
@ignore
|
||||
This file documents bzip2 version 1.0, and associated library
|
||||
This file documents bzip2 version 1.0.2, and associated library
|
||||
libbzip2, written by Julian Seward (jseward@acm.org).
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward
|
||||
Copyright (C) 1996-2002 Julian R Seward
|
||||
|
||||
Permission is granted to make and distribute verbatim copies of
|
||||
this manual provided the copyright notice and this permission notice
|
||||
@ -30,8 +30,8 @@ END-INFO-DIR-ENTRY
|
||||
@titlepage
|
||||
@title bzip2 and libbzip2
|
||||
@subtitle a program and library for data compression
|
||||
@subtitle copyright (C) 1996-2000 Julian Seward
|
||||
@subtitle version 1.0 of 21 March 2000
|
||||
@subtitle copyright (C) 1996-2002 Julian Seward
|
||||
@subtitle version 1.0.2 of 30 December 2001
|
||||
@author Julian Seward
|
||||
|
||||
@end titlepage
|
||||
@ -40,11 +40,17 @@ END-INFO-DIR-ENTRY
|
||||
@parskip 2mm
|
||||
|
||||
@end iftex
|
||||
@node Top, Overview, (dir), (dir)
|
||||
@node Top,,, (dir)
|
||||
|
||||
The following text is the License for this software. You should
|
||||
find it identical to that contained in the file LICENSE in the
|
||||
source distribution.
|
||||
|
||||
@bf{------------------ START OF THE LICENSE ------------------}
|
||||
|
||||
This program, @code{bzip2},
|
||||
and associated library @code{libbzip2}, are
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
@ -82,14 +88,16 @@ Julian Seward, Cambridge, UK.
|
||||
|
||||
@code{jseward@@acm.org}
|
||||
|
||||
@code{http://sourceware.cygnus.com/bzip2}
|
||||
@code{bzip2}/@code{libbzip2} version 1.0.2 of 30 December 2001.
|
||||
|
||||
@bf{------------------ END OF THE LICENSE ------------------}
|
||||
|
||||
Web sites:
|
||||
|
||||
@code{http://sources.redhat.com/bzip2}
|
||||
|
||||
@code{http://www.cacheprof.org}
|
||||
|
||||
@code{http://www.muraroa.demon.co.uk}
|
||||
|
||||
@code{bzip2}/@code{libbzip2} version 1.0 of 21 March 2000.
|
||||
|
||||
PATENTS: To the best of my knowledge, @code{bzip2} does not use any patented
|
||||
algorithms. However, I do not have the resources available to carry out
|
||||
a full patent search. Therefore I cannot give any guarantee of the
|
||||
@ -101,7 +109,6 @@ above statement.
|
||||
|
||||
|
||||
|
||||
@node Overview, Implementation, Top, Top
|
||||
@chapter Introduction
|
||||
|
||||
@code{bzip2} compresses files using the Burrows-Wheeler
|
||||
@ -134,7 +141,7 @@ and nothing else.
|
||||
@unnumberedsubsubsec NAME
|
||||
@itemize
|
||||
@item @code{bzip2}, @code{bunzip2}
|
||||
- a block-sorting file compressor, v1.0
|
||||
- a block-sorting file compressor, v1.0.2
|
||||
@item @code{bzcat}
|
||||
- decompresses files to stdout
|
||||
@item @code{bzip2recover}
|
||||
@ -264,6 +271,11 @@ This really performs a trial decompression and throws away the result.
|
||||
Force overwrite of output files. Normally, @code{bzip2} will not overwrite
|
||||
existing output files. Also forces @code{bzip2} to break hard links
|
||||
to files, which it otherwise wouldn't do.
|
||||
|
||||
@code{bzip2} normally declines to decompress files which don't have the
|
||||
correct magic header bytes. If forced (@code{-f}), however, it will
|
||||
pass such files through unmodified. This is how GNU @code{gzip}
|
||||
behaves.
|
||||
@item -k --keep
|
||||
Keep (don't delete) input files during compression
|
||||
or decompression.
|
||||
@ -286,9 +298,13 @@ Further @code{-v}'s increase the verbosity level, spewing out lots of
|
||||
information which is primarily of interest for diagnostic purposes.
|
||||
@item -L --license -V --version
|
||||
Display the software version, license terms and conditions.
|
||||
@item -1 to -9
|
||||
@item -1 (or --fast) to -9 (or --best)
|
||||
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
|
||||
effect when decompressing. See MEMORY MANAGEMENT below.
|
||||
The @code{--fast} and @code{--best} aliases are primarily for GNU
|
||||
@code{gzip} compatibility. In particular, @code{--fast} doesn't make
|
||||
things significantly faster. And @code{--best} merely selects the
|
||||
default behaviour.
|
||||
@item --
|
||||
Treats all subsequent arguments as file names, even if they start
|
||||
with a dash. This is so you can handle files with names beginning
|
||||
@ -389,21 +405,19 @@ integrity of the resulting files, and decompress those which are
|
||||
undamaged.
|
||||
|
||||
@code{bzip2recover}
|
||||
takes a single argument, the name of the damaged file,
|
||||
and writes a number of files @code{rec0001file.bz2},
|
||||
@code{rec0002file.bz2}, etc, containing the extracted blocks.
|
||||
The output filenames are designed so that the use of
|
||||
wildcards in subsequent processing -- for example,
|
||||
@code{bzip2 -dc rec*file.bz2 > recovered_data} -- lists the files in
|
||||
takes a single argument, the name of the damaged file, and writes a
|
||||
number of files @code{rec00001file.bz2}, @code{rec00002file.bz2}, etc,
|
||||
containing the extracted blocks. The output filenames are designed so
|
||||
that the use of wildcards in subsequent processing -- for example,
|
||||
@code{bzip2 -dc rec*file.bz2 > recovered_data} -- processes the files in
|
||||
the correct order.
|
||||
|
||||
@code{bzip2recover} should be of most use dealing with large @code{.bz2}
|
||||
files, as these will contain many blocks. It is clearly
|
||||
futile to use it on damaged single-block files, since a
|
||||
damaged block cannot be recovered. If you wish to minimise
|
||||
any potential data loss through media or transmission errors,
|
||||
you might consider compressing with a smaller
|
||||
block size.
|
||||
files, as these will contain many blocks. It is clearly futile to use
|
||||
it on damaged single-block files, since a damaged block cannot be
|
||||
recovered. If you wish to minimise any potential data loss through
|
||||
media or transmission errors, you might consider compressing with a
|
||||
smaller block size.
|
||||
|
||||
|
||||
@unnumberedsubsubsec PERFORMANCE NOTES
|
||||
@ -435,22 +449,31 @@ I/O error messages are not as helpful as they could be. @code{bzip2}
|
||||
tries hard to detect I/O errors and exit cleanly, but the details of
|
||||
what the problem is sometimes seem rather misleading.
|
||||
|
||||
This manual page pertains to version 1.0 of @code{bzip2}. Compressed
|
||||
This manual page pertains to version 1.0.2 of @code{bzip2}. Compressed
|
||||
data created by this version is entirely forwards and backwards
|
||||
compatible with the previous public releases, versions 0.1pl2, 0.9.0 and
|
||||
0.9.5, but with the following exception: 0.9.0 and above can correctly
|
||||
decompress multiple concatenated compressed files. 0.1pl2 cannot do
|
||||
this; it will stop after decompressing just the first file in the
|
||||
stream.
|
||||
compatible with the previous public releases, versions 0.1pl2, 0.9.0,
|
||||
0.9.5, 1.0.0 and 1.0.1, but with the following exception: 0.9.0 and
|
||||
above can correctly decompress multiple concatenated compressed files.
|
||||
0.1pl2 cannot do this; it will stop after decompressing just the first
|
||||
file in the stream.
|
||||
|
||||
@code{bzip2recover} versions prior to this one, 1.0.2, used 32-bit
|
||||
integers to represent bit positions in compressed files, so it could not
|
||||
handle compressed files more than 512 megabytes long. Version 1.0.2 and
|
||||
above uses 64-bit ints on some platforms which support them (GNU
|
||||
supported targets, and Windows). To establish whether or not
|
||||
@code{bzip2recover} was built with such a limitation, run it without
|
||||
arguments. In any event you can build yourself an unlimited version if
|
||||
you can recompile it with @code{MaybeUInt64} set to be an unsigned
|
||||
64-bit integer.
|
||||
|
||||
@code{bzip2recover} uses 32-bit integers to represent bit positions in
|
||||
compressed files, so it cannot handle compressed files more than 512
|
||||
megabytes long. This could easily be fixed.
|
||||
|
||||
|
||||
@unnumberedsubsubsec AUTHOR
|
||||
Julian Seward, @code{jseward@@acm.org}.
|
||||
|
||||
@code{http://sources.redhat.com/bzip2}
|
||||
|
||||
The ideas embodied in @code{bzip2} are due to (at least) the following
|
||||
people: Michael Burrows and David Wheeler (for the block sorting
|
||||
transformation), David Wheeler (again, for the Huffman coder), Peter
|
||||
@ -461,8 +484,9 @@ indebted for their help, support and advice. See the manual in the
|
||||
source distribution for pointers to sources of documentation. Christian
|
||||
von Roques encouraged me to look for faster sorting algorithms, so as to
|
||||
speed up compression. Bela Lubkin encouraged me to improve the
|
||||
worst-case compression performance. Many people sent patches, helped
|
||||
with portability problems, lent machines, gave advice and were generally
|
||||
worst-case compression performance. The @code{bz*} scripts are derived
|
||||
from those of GNU @code{gzip}. Many people sent patches, helped with
|
||||
portability problems, lent machines, gave advice and were generally
|
||||
helpful.
|
||||
|
||||
@end quotation
|
||||
@ -1769,16 +1793,20 @@ was compiled with @code{BZ_NO_STDIO} set.
|
||||
For a normal compile, an assertion failure yields the message
|
||||
@example
|
||||
bzip2/libbzip2: internal error number N.
|
||||
This is a bug in bzip2/libbzip2, 1.0 of 21-Mar-2000.
|
||||
This is a bug in bzip2/libbzip2, 1.0.2, 30-Dec-2001.
|
||||
Please report it to me at: jseward@@acm.org. If this happened
|
||||
when you were using some program which uses libbzip2 as a
|
||||
component, you should also report this bug to the author(s)
|
||||
of that program. Please make an effort to report this bug;
|
||||
timely and accurate bug reports eventually lead to higher
|
||||
quality software. Thanks. Julian Seward, 21 March 2000.
|
||||
quality software. Thanks. Julian Seward, 30 December 2001.
|
||||
@end example
|
||||
where @code{N} is some error code number. @code{exit(3)}
|
||||
is then called.
|
||||
where @code{N} is some error code number. If @code{N == 1007}, it also
|
||||
prints some extra text advising the reader that unreliable memory is
|
||||
often associated with internal error 1007. (This is a
|
||||
frequently-observed-phenomenon with versions 1.0.0/1.0.1).
|
||||
|
||||
@code{exit(3)} is then called.
|
||||
|
||||
For a @code{stdio}-free library, assertion failures result
|
||||
in a call to a function declared as:
|
||||
@ -2056,10 +2084,10 @@ Maybe this isn't what you want.
|
||||
If you want a compressor and/or library which is faster, uses less
|
||||
memory but gets pretty good compression, and has minimal latency,
|
||||
consider Jean-loup
|
||||
Gailly's and Mark Adler's work, @code{zlib-1.1.2} and
|
||||
Gailly's and Mark Adler's work, @code{zlib-1.1.3} and
|
||||
@code{gzip-1.2.4}. Look for them at
|
||||
|
||||
@code{http://www.cdrom.com/pub/infozip/zlib} and
|
||||
@code{http://www.zlib.org} and
|
||||
@code{http://www.gzip.org} respectively.
|
||||
|
||||
For something faster and lighter still, you might try Markus F X J
|
||||
|
16
mk251.c
Normal file
16
mk251.c
Normal file
@ -0,0 +1,16 @@
|
||||
|
||||
/* Spew out a long sequence of the byte 251. When fed to bzip2
|
||||
versions 1.0.0 or 1.0.1, causes it to die with internal error
|
||||
1007 in blocksort.c. This assertion misses an extremely rare
|
||||
case, which is fixed in this version (1.0.2) and above.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
|
||||
int main ()
|
||||
{
|
||||
int i;
|
||||
for (i = 0; i < 48500000 ; i++)
|
||||
putchar(251);
|
||||
return 0;
|
||||
}
|
@ -8,7 +8,7 @@
|
||||
This file is a part of bzip2 and/or libbzip2, a program and
|
||||
library for lossless, block-sorting data compression.
|
||||
|
||||
Copyright (C) 1996-2000 Julian R Seward. All rights reserved.
|
||||
Copyright (C) 1996-2002 Julian R Seward. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions
|
||||
|
4
words3
4
words3
@ -15,8 +15,8 @@ not actually execute them.
|
||||
|
||||
Instructions for use are in the preformatted manual page, in the file
|
||||
bzip2.txt. For more detailed documentation, read the full manual.
|
||||
It is available in Postscript form (manual.ps) and HTML form
|
||||
(manual_toc.html).
|
||||
It is available in Postscript form (manual.ps), PDF form (manual.pdf),
|
||||
and HTML form (manual_toc.html).
|
||||
|
||||
You can also do "bzip2 --help" to see some helpful information.
|
||||
"bzip2 -L" displays the software license.
|
||||
|
Loading…
Reference in New Issue
Block a user