scuffed-code/icu4c/source/extra/uconv/uconv.1.in

290 lines
6.5 KiB
Groff
Raw Normal View History

.\" Hey, Emacs! This is -*-nroff-*- you know...
.\"
.\" uconv.1: manual page for the uconv utility.
.\"
.\" Copyright (C) 2000-2001 IBM, Inc. and others.
.\"
.\" Manual page by Yves Arrouye <yves@realnames.com>.
.\"
.TH UCONV 1 "9 November 2001" "ICU MANPAGE" "ICU @VERSION@ Manual"
.SH NAME
.B uconv
\- convert data from one encoding to another
.SH SYNOPSIS
.B uconv
[
.BR "\-h\fP, \fB\-?\fP, \fB\-\-help"
]
[
.BI "\-V\fP, \fB\-\-version"
]
[
.BI "\-s\fP, \fB\-\-silent"
]
[
.BI "\-v\fP, \fB\-\-verbose"
]
[
.BI "\-l\fP, \fB\-\-list"
|
.BI "\-l\fP, \fB\-\-list\-code" " code"
|
.BI "\-\-default-code"
|
.BI "\-L\fP, \fB\-\-list\-transliterators"
]
[
.BI "\-\-canon"
]
[
.BI "\-x" " transliterator
]
[
.BI "\-\-to\-callback" " callback"
|
.B "\-c"
]
[
.BI "\-\-from\-callback" " callback"
|
.B "\-i"
]
[
.BI "\-\-callback" " callback"
]
[
.BI "\-\-fallback"
|
.BI "\-\-no\-fallback"
]
.BI "\-f\fP, \fB\-\-from\-code" " encoding"
.BI "\-t\fP, \fB\-\-to\-code" " encoding"
[
.IR file .\|.\|.
]
[
.BI "\-o\fP, \fB\-\-output" " file"
]
.SH DESCRIPTION
.B uconv
converts, or transcodes, each given
.I file
(or its standard input if no
.I file
is specified) from one
.I encoding
to another. The transcoding is done using Unicode as a pivot encoding
(i.e. the data are first transcoded from their original encoding to
Unicode, and then from Unicode to the destination encoding).
.PP
When calling
.BR uconv ,
it is possible to specify callbacks that are used to handle invalid
characters in the input, or characters that cannot be transcoded to
the destination encoding. Some encodings, for example, offer a default
substitution character that can be used to represent the occurence of
such characters in the input. Other callbacks offer a useful visual
representation of the invalid data.
.PP
.B uconv
can also run the transcoding through a specified
.IR transliterator ,
in which case transliteration will happen as an intermediate step,
after the data have been transcoded to Unicode.
.SH OPTIONS
.TP
.BR \-h\fP, \fB\-?\fP, \fB\-\-help
Print help about usage and exit.
.TP
.BR \-V\fP, \fB\-\-version
Print the version of
.B uconv
and exit.
.TP
.BI "\-s\fP, \fB\-\-silent"
Suppress messages during execution.
.TP
.BI "\-v\fP, \fB\-\-verbose"
Display extra informative messages during execution.
.TP
.BI "\-l\fP, \fB\-\-list"
List all the available encodings and exit.
.TP
.BI "\-l\fP, \fB\-\-list\-code" " code"
List only the
.I code
encoding and exit. If
.I code
is not a proper encoding, exit with an error.
.TP
.BI "\-\-default-code"
List only the name of the default encoding and exit.
.TP
.BI "\-L\fP, \fB\-\-list\-transliterators"
List all the available transliterators and exit.
.TP
.BI "\--canon"
If used with
.BI "\-l\fP, \fB\-\-list"
or
.BR "\-\-default-code" ,
the list of encodings is produced in a format compatible with
.BR convrtrs.txt (5).
If used with
.BR "\-L\fP, \fB\-\-list\-transliterators" ,
print only one transliterator name per line.
.TP
.BI "\-x" " transliterator"
Run the transcoded Unicode data through the given
.IR transliterator
and use the transliterated data as input for the transcoding to
the the destination encoding.
.TP
.BI "\-\-to\-callback" " callback"
Use
.I callback
to handle characters that cannot be transcoded to the destination
encoding. See section
.B CALLBACKS
for details on valid callbacks.
.TP
.B "\-c"
Omit invalid characters from the output.
Same as
.BR "\-\-to\-callback skip" .
.TP
.BI "\-\-from\-callback" " callback"
Use
.I callback
to handle characters that cannot be transcoded from the original
encoding. See section
.B CALLBACKS
for details on valid callbacks.
.TP
.B "\-i"
Ignore invalid sequences in the input.
Same as
.BR "\-\-from\-callback skip" .
.TP
.BI "\-\-callback" " callback"
Use
.I callback
to handle both characters that cannot be transcoded from the original
encoding and characters that cannot be transcoded to the destination
encoding. See section
.B CALLBACKS
for details on valid callbacks.
.TP
.BI "\-\-fallback"
Use the fallback mapping when transcoding from
Unicode to the destination encoding.
.TP
.BI "\-\-no\-fallback"
Do not use the fallback mapping when transcoding from Unicode to the
destination encoding.
This is the default.
.TP
.BI "\-f\fP, \fB\-\-from\-code" " encoding"
Set the original encoding of the data to
.IR encoding .
.TP
.BI "\-t\fP, \fB\-\-to\-code" " encoding"
Transcode the data to
.IR encoding .
.TP
.BI "\-o\fP, \fB\-\-output" " file"
Write the transcoded data to
.IR file .
.SH CALLBACKS
.B uconv
supports specifying callbacks to handle invalid data. Callbacks can be
set for both directions of transcoding: from the original encoding to
Unicode, with the
.BR "\-\-from\-callback"
option, and from Unicode to the destination encoding, with the
.BR "\-\-to\-callback"
option.
.PP
The following is a list of valid
.I callback
names, alonmg with a description of their behavior. The list of
callbacks actually supported by
.B uconv
is displayed when it is called with
.BR "\-h\fP, \fB\-\-help" .
.PP
.TP \w'\fBescape-unicode'u+3n
.B substitute
Write the the encoding's substitute sequence, or the Unicode
replacement character
.B U+FFFD
when transcoding to Unicode.
.TP
.B skip
Ignore the invalid data.
.TP
.B stop
Stop with an error when encountering invalid data.
This is the default callback.
.TP
.B escape
Same as
.BR escape-icu .
.TP
.B escape-icu
Replace the missing characters with a string of the format
.BR %U\fIhhhh\fP ,
where
.I hhhh
is the hexadecimal value of the character.
.TP
.B escape-java
Replace the missing characters with a string of the format
.BR "\eu\fIhhhh\fP" ,
where
.I hhhh
is the hexadecimal value of the character.
.TP
.B escape-c
Replace the missing characters with a string of the format
.BR \eu\fIhhhh\fP
for plane 0 characters, and
.BR \eu\fIhhhh\fP\eu\fIhhhh\fP
for planes 1 and above characters,
where
.I hhhh
is the hexadecimal value of the character. Characters from planes 1
and above are written as surrogate pairs.
.TP
.B escape-xml
Same as
.BR escape-xml-dec .
.TP
.B escape-xml-dec
Replace the missing characters with a string of the format
.BR &#x\fInnnn\fP; ,
where
.I nnnn
is the decimal value of the character.
.TP
.B escape-xml-hex
Replace the missing characters with a string of the format
.BR &#x\fIhhhh\fP; ,
where
.I hhhh
is the hexadecimal value of the character.
.TP
.B escape-unicode
Replace the missing characters with a string of the format
.BR U+\fIhhhh\fP ,
where
.I hhhh
is the hexadecimal value of the character. This is the format
universally used to denote a Unicode codepoint in the litterature.
.SH VERSION
@VERSION@
.SH COPYRIGHT
Copyright (C) 2001 IBM, Inc. and others.
.SH SEE ALSO
.BR convrtrs.txt (5)