12be335ada
tempted to let one do uconv -t utf-8 -f latin1 file1 -f euc-jp file2 so that many files of various encodings could be converted at the same time to a single encoding, but will do that later after cleaning up the sloppy way I enabled multiple files for today. X-SVN-Rev: 7416
265 lines
5.9 KiB
Groff
265 lines
5.9 KiB
Groff
.\" Hey, Emacs! This is -*-nroff-*- you know...
|
|
.\"
|
|
.\" uconv.1: manual page for the uconv utility.
|
|
.\"
|
|
.\" Copyright (C) 2000-2001 IBM, Inc. and others.
|
|
.\"
|
|
.\" Manual page by Yves Arrouye <yves@realnames.com>.
|
|
.\"
|
|
.TH UCONV 1 "9 November 2001" "ICU MANPAGE" "ICU @VERSION@ Manual"
|
|
.SH NAME
|
|
.B uconv
|
|
\- convert data from one encoding to another
|
|
.SH SYNOPSIS
|
|
.B uconv
|
|
[
|
|
.BR "\-h\fP, \fB\-?\fP, \fB\-\-help"
|
|
]
|
|
[
|
|
.BI "\-V\fP, \fB\-\-version"
|
|
]
|
|
[
|
|
.BI "\-s\fP, \fB\-\-silent"
|
|
]
|
|
[
|
|
.BI "\-v\fP, \fB\-\-verbose"
|
|
]
|
|
[
|
|
.BI "\-l\fP, \fB\-\-list"
|
|
|
|
|
.BI "\-l\fP, \fB\-\-list\-code" " code"
|
|
|
|
|
.BI "\-\-default-code"
|
|
|
|
|
.BI "\-L\fP, \fB\-\-list\-transliterators"
|
|
]
|
|
[
|
|
.BI "\-\-canon"
|
|
]
|
|
[
|
|
.BI "\-x" " transliterator
|
|
]
|
|
[
|
|
.BI "\-\-to\-callback" " callback"
|
|
|
|
|
.B "\-c"
|
|
]
|
|
[
|
|
.BI "\-\-from\-callback" " callback"
|
|
|
|
|
.B "\-i"
|
|
]
|
|
[
|
|
.BI "\-\-callback" " callback"
|
|
]
|
|
.BI "\-f\fP, \fB\-\-from\-code" " encoding"
|
|
.BI "\-t\fP, \fB\-\-to\-code" " encoding"
|
|
[
|
|
.IR file .\|.\|.
|
|
]
|
|
[
|
|
.BI "\-o\fP, \fB\-\-output" " file"
|
|
]
|
|
.SH DESCRIPTION
|
|
.B uconv
|
|
converts each given
|
|
.I file
|
|
(or its standard input if no
|
|
.I file
|
|
is specified) from one
|
|
.I encoding
|
|
to another. The transcoding is done using Unicode as a pivot encoding
|
|
(e.g. the data are first transcoded from their original encoding to
|
|
Unicode, and then from Unicode to the destination encoding).
|
|
It is possible to specify callbacks that are used to handle invalid
|
|
characters in the input, or characters that cannot be transcoded to
|
|
the destination encoding.
|
|
.PP
|
|
.B uconv
|
|
can also run the transcoding through a specified
|
|
.IR transliterator ,
|
|
in which case transliteration will happen as an intermediate step,
|
|
after the data have been transcoded to Unicode.
|
|
.SH OPTIONS
|
|
.TP
|
|
.BR \-h\fP, \fB\-?\fP, \fB\-\-help
|
|
Print help about usage and exit.
|
|
.TP
|
|
.BI "\-s\fP, \fB\-\-silent"
|
|
Suppress messages during execution.
|
|
.TP
|
|
.BI "\-v\fP, \fB\-\-verbose"
|
|
Display extra informative messages during execution.
|
|
.TP
|
|
.BI "\-l\fP, \fB\-\-list"
|
|
List all the available encodings and exit.
|
|
.TP
|
|
.BI "\-l\fP, \fB\-\-list\-code" " code"
|
|
List only the
|
|
.I code
|
|
encoding and exit. If
|
|
.I code
|
|
is not a proper encoding, exit with an error.
|
|
.TP
|
|
.BI "\-\-default-code"
|
|
List only the name of the default encoding and exit.
|
|
.TP
|
|
.BI "\-L\fP, \fB\-\-list\-transliterators"
|
|
List all the available transliterators and exit.
|
|
.TP
|
|
.BI "\--canon"
|
|
If used with
|
|
.BI "\-l\fP, \fB\-\-list"
|
|
or
|
|
.BR "\-\-default-code" ,
|
|
the list of encodings is produced in a format compatible with
|
|
.BR convrtrs.txt (5).
|
|
If used with
|
|
.BR "\-L\fP, \fB\-\-list\-transliterators" ,
|
|
print only one transliterator name per line.
|
|
.TP
|
|
.BI "\-x" " transliterator"
|
|
Run the transcoded Unicode data through the given
|
|
.IR transliterator
|
|
and use the transliterated data as input for the transcoding to
|
|
the the destination encoding.
|
|
.TP
|
|
.BI "\-\-to\-callback" " callback"
|
|
Use
|
|
.I callback
|
|
to handle characters that cannot be transcoded to the destination
|
|
encoding. See section
|
|
.B CALLBACKS
|
|
for details on valid callbacks.
|
|
.TP
|
|
.B "\-c"
|
|
Omit invalid characters from the output.
|
|
Same as
|
|
.BR "\-\-to\-callback skip" .
|
|
.TP
|
|
.BI "\-\-from\-callback" " callback"
|
|
Use
|
|
.I callback
|
|
to handle characters that cannot be transcoded from the original
|
|
encoding. See section
|
|
.B CALLBACKS
|
|
for details on valid callbacks.
|
|
.TP
|
|
.B "\-i"
|
|
Ignore invalid sequences in the input.
|
|
Same as
|
|
.BR "\-\-from\-callback skip" .
|
|
.TP
|
|
.BI "\-\-callback" " callback"
|
|
Use
|
|
.I callback
|
|
to handle both characters that cannot be transcoded from the original
|
|
encoding and characters that cannot be transcoded to the destination
|
|
encoding. See section
|
|
.B CALLBACKS
|
|
for details on valid callbacks.
|
|
.TP
|
|
.BI "\-f\fP, \fB\-\-from\-code" " encoding"
|
|
Set the original encoding of the data to
|
|
.IR encoding .
|
|
.TP
|
|
.BI "\-t\fP, \fB\-\-to\-code" " encoding"
|
|
Transcode the data to
|
|
.IR encoding .
|
|
.TP
|
|
.BI "\-o\fP, \fB\-\-output" " file"
|
|
Write the transcode data to
|
|
.IR file .
|
|
.SH CALLBACKS
|
|
.B uconv
|
|
supports specifying callbacks to handle invalid data. Callbacks can be
|
|
set for both directions of transcoding: from the original encoding to
|
|
Unicode, with the
|
|
.BR "\-\-from\-callback"
|
|
option, and from Unicode to the destination encoding, with the
|
|
.BR "\-\-to\-callback"
|
|
option.
|
|
.PP
|
|
The following is a list of valid
|
|
.I callback
|
|
names, alonmg with a description of their behavior. The list of
|
|
callbacks actually supported by
|
|
.B uconv
|
|
is displayed when it is called with
|
|
.BR "\-h\fP, \fB\-\-help" .
|
|
.PP
|
|
.TP \w'\fBescape-unicode+3n
|
|
.B substitute
|
|
Write the the encoding's substitute sequence, or the Unicode
|
|
replacement character
|
|
.B U+FFFD
|
|
when transcoding to Unicode.
|
|
.TP
|
|
.B skip
|
|
Ignore the invalid data.
|
|
.TP
|
|
.B stop
|
|
Stop with an error when encountering invalid data.
|
|
This is the default callback.
|
|
.TP
|
|
.B escape
|
|
Same as
|
|
.BR escape-icu .
|
|
.TP
|
|
.B escape-icu
|
|
Replace the missing characters with a string of the format
|
|
.BR %U\fIhhhh\fP ,
|
|
where
|
|
.I hhhh
|
|
is the hexadecimal value of the character.
|
|
.TP
|
|
.B escape-java
|
|
Replace the missing characters with a string of the format
|
|
.BR "\eu\fIhhhh\fP" ,
|
|
where
|
|
.I hhhh
|
|
is the hexadecimal value of the character.
|
|
.TP
|
|
.B escape-c
|
|
Replace the missing characters with a string of the format
|
|
.BR \eu\fIhhhh\fP
|
|
for plane 0 characters, and
|
|
.BR \eu\fIhhhh\fP\eu\fIhhhh\fP
|
|
for planes 1 and above characters,
|
|
where
|
|
.I hhhh
|
|
is the hexadecimal value of the character. Characters from planes 1
|
|
and above are written as surrogate pairs.
|
|
.TP
|
|
.B escape-xml
|
|
Same as
|
|
.BR escape-xml-dec .
|
|
.TP
|
|
.B escape-xml-dec
|
|
Replace the missing characters with a string of the format
|
|
.BR &#x\fInnnn\fP; ,
|
|
where
|
|
.I nnnn
|
|
is the decimal value of the character.
|
|
.TP
|
|
.B escape-xml-hex
|
|
Replace the missing characters with a string of the format
|
|
.BR &#x\fIhhhh\fP; ,
|
|
where
|
|
.I hhhh
|
|
is the hexadecimal value of the character.
|
|
.TP
|
|
.B escape-unicode
|
|
Replace the missing characters with a string of the format
|
|
.BR U+\fIhhhh\fP ,
|
|
where
|
|
.I hhhh
|
|
is the hexadecimal value of the character. This is the format
|
|
universally used to denote a Unicode codepoint in the litterature.
|
|
.SH VERSION
|
|
@VERSION@
|
|
.SH COPYRIGHT
|
|
Copyright (C) 2001 IBM, Inc. and others.
|
|
.SH SEE ALSO
|
|
.BR convrtrs.txt (5)
|