.\" Hey, Emacs! This is -*-nroff-*- you know... .\" .\" uconv.1: manual page for the uconv utility. .\" .\" Copyright (C) 2000-2001 IBM, Inc. and others. .\" .\" Manual page by Yves Arrouye . .\" .TH UCONV 1 "9 November 2001" "ICU MANPAGE" "ICU @VERSION@ Manual" .SH NAME .B uconv \- convert data from one encoding to another .SH SYNOPSIS .B uconv [ .BR "\-h\fP, \fB\-?\fP, \fB\-\-help" ] [ .BI "\-V\fP, \fB\-\-version" ] [ .BI "\-s\fP, \fB\-\-silent" ] [ .BI "\-v\fP, \fB\-\-verbose" ] [ .BI "\-l\fP, \fB\-\-list" | .BI "\-l\fP, \fB\-\-list\-code" " code" | .BI "\-\-default-code" | .BI "\-L\fP, \fB\-\-list\-transliterators" ] [ .BI "\-\-canon" ] [ .BI "\-x" " transliterator ] [ .BI "\-\-to\-callback" " callback" | .B "\-c" ] [ .BI "\-\-from\-callback" " callback" | .B "\-i" ] [ .BI "\-\-callback" " callback" ] [ .BI "\-\-fallback" | .BI "\-\-no\-fallback" ] [ .BI "\-b\fP, \fB\-\-block\-size" " size" ] .BI "\-f\fP, \fB\-\-from\-code" " encoding" .BI "\-t\fP, \fB\-\-to\-code" " encoding" [ .IR file .\|.\|. ] [ .BI "\-o\fP, \fB\-\-output" " file" ] .SH DESCRIPTION .B uconv converts, or transcodes, each given .I file (or its standard input if no .I file is specified) from one .I encoding to another. The transcoding is done using Unicode as a pivot encoding (i.e. the data are first transcoded from their original encoding to Unicode, and then from Unicode to the destination encoding). .PP When calling .BR uconv , it is possible to specify callbacks that are used to handle invalid characters in the input, or characters that cannot be transcoded to the destination encoding. Some encodings, for example, offer a default substitution character that can be used to represent the occurence of such characters in the input. Other callbacks offer a useful visual representation of the invalid data. .PP .B uconv can also run the transcoding through a specified .IR transliterator , in which case transliteration will happen as an intermediate step, after the data have been transcoded to Unicode. .SH OPTIONS .TP .BR "\-h\fP, \fB\-?\fP, \fB\-\-help" Print help about usage and exit. .TP .BR "\-V\fP, \fB\-\-version" Print the version of .B uconv and exit. .TP .BI "\-s\fP, \fB\-\-silent" Suppress messages during execution. .TP .BI "\-v\fP, \fB\-\-verbose" Display extra informative messages during execution. .TP .BI "\-l\fP, \fB\-\-list" List all the available encodings and exit. .TP .BI "\-l\fP, \fB\-\-list\-code" " code" List only the .I code encoding and exit. If .I code is not a proper encoding, exit with an error. .TP .BI "\-\-default-code" List only the name of the default encoding and exit. .TP .BI "\-L\fP, \fB\-\-list\-transliterators" List all the available transliterators and exit. .TP .BI "\--canon" If used with .BI "\-l\fP, \fB\-\-list" or .BR "\-\-default-code" , the list of encodings is produced in a format compatible with .BR convrtrs.txt (5). If used with .BR "\-L\fP, \fB\-\-list\-transliterators" , print only one transliterator name per line. .TP .BI "\-x" " transliterator" Run the transcoded Unicode data through the given .IR transliterator and use the transliterated data as input for the transcoding to the the destination encoding. .TP .BI "\-\-to\-callback" " callback" Use .I callback to handle characters that cannot be transcoded to the destination encoding. See section .B CALLBACKS for details on valid callbacks. .TP .B "\-c" Omit invalid characters from the output. Same as .BR "\-\-to\-callback skip" . .TP .BI "\-\-from\-callback" " callback" Use .I callback to handle characters that cannot be transcoded from the original encoding. See section .B CALLBACKS for details on valid callbacks. .TP .B "\-i" Ignore invalid sequences in the input. Same as .BR "\-\-from\-callback skip" . .TP .BI "\-\-callback" " callback" Use .I callback to handle both characters that cannot be transcoded from the original encoding and characters that cannot be transcoded to the destination encoding. See section .B CALLBACKS for details on valid callbacks. .TP .BI "\-\-fallback" Use the fallback mapping when transcoding from Unicode to the destination encoding. .TP .BI "\-\-no\-fallback" Do not use the fallback mapping when transcoding from Unicode to the destination encoding. This is the default. .TP .BI "\-b\fP, \fB\-\-block\-size" " size" Read input in blocks of .I size bytes at a time. The default block size is 4096. .TP .BI "\-f\fP, \fB\-\-from\-code" " encoding" Set the original encoding of the data to .IR encoding . .TP .BI "\-t\fP, \fB\-\-to\-code" " encoding" Transcode the data to .IR encoding . .TP .BI "\-o\fP, \fB\-\-output" " file" Write the transcoded data to .IR file . .SH CALLBACKS .B uconv supports specifying callbacks to handle invalid data. Callbacks can be set for both directions of transcoding: from the original encoding to Unicode, with the .BR "\-\-from\-callback" option, and from Unicode to the destination encoding, with the .BR "\-\-to\-callback" option. .PP The following is a list of valid .I callback names, alonmg with a description of their behavior. The list of callbacks actually supported by .B uconv is displayed when it is called with .BR "\-h\fP, \fB\-\-help" . .PP .TP \w'\fBescape-unicode'u+3n .B substitute Write the the encoding's substitute sequence, or the Unicode replacement character .B U+FFFD when transcoding to Unicode. .TP .B skip Ignore the invalid data. .TP .B stop Stop with an error when encountering invalid data. This is the default callback. .TP .B escape Same as .BR escape-icu . .TP .B escape-icu Replace the missing characters with a string of the format .BR %U\fIhhhh\fP for plane 0 characters, and .BR %U\fIhhhh\fP%U\fIhhhh\fP for planes 1 and above characters, where .I hhhh is the hexadecimal value of one of the UTF-16 code units representing the character. Characters from planes 1 and above are written as a pair of UTF-16 surrogate code units. .TP .B escape-java Replace the missing characters with a string of the format .BR \eu\fIhhhh\fP for plane 0 characters, and .BR \eu\fIhhhh\fP\eu\fIhhhh\fP for planes 1 and above characters, where .I hhhh is the hexadecimal value of one of the UTF-16 code units representing the character. Characters from planes 1 and above are written as a pair of UTF-16 surrogate code units. .TP .B escape-c Replace the missing characters with a string of the format .BR \eu\fIhhhh\fP for plane 0 characters, and .BR \eU\fIhhhhhhhh\fP for planes 1 and above characters, where .I hhhh and .I hhhhhhhh are the hexadecimal values of the Unicode codepoint. .TP .B escape-xml Same as .BR escape-xml-dec . .TP .B escape-xml-dec Replace the missing characters with a string of the format .BR &#x\fInnnn\fP; , where .I nnnn is the decimal value of the Unicode codepoint. .TP .B escape-xml-hex Replace the missing characters with a string of the format .BR &#x\fIhhhh\fP; , where .I hhhh is the hexadecimal value of the Unicode codepoint. .TP .B escape-unicode Replace the missing characters with a string of the format .BR {U+\fIhhhh\fP} , where .I hhhh is the hexadecimal value of the Unicode codepoint. That hexadecimal string is of variable length and can use from 4 to 6 digits. This is the format universally used to denote a Unicode codepoint in the litterature, delimited by curly braces for easy recognition of those substitutions in the output. .SH FILES .TP 15 .B uconvmsg.dat Compiled resource bundle containing localized messages printed by .BR uconv . .SH VERSION @VERSION@ .SH COPYRIGHT Copyright (C) 2001 IBM, Inc. and others. .SH SEE ALSO .BR convrtrs.txt (5)