c64c0299d7
X-SVN-Rev: 32184
132 lines
3.2 KiB
Groff
132 lines
3.2 KiB
Groff
.\" Hey, Emacs! This is -*-nroff-*- you know...
|
|
.\"
|
|
.\" gendict.1: manual page for the gendict utility
|
|
.\"
|
|
.\" Copyright (C) 2012 International Business Machines Corporation and others
|
|
.\"
|
|
.TH GENDICT 1 "1 June 2012" "ICU MANPAGE" "ICU @VERSION@ Manual"
|
|
.SH NAME
|
|
.B gendict
|
|
\- Compiles word list into ICU string trie dictionary
|
|
.SH SYNOPSIS
|
|
.B gendict
|
|
[
|
|
.BR "\fB\-\-uchars"
|
|
|
|
|
.BR "\fB\-\-bytes"
|
|
.BI "\fB\-\-transform" " transform"
|
|
]
|
|
[
|
|
.BR "\-h\fP, \fB\-?\fP, \fB\-\-help"
|
|
]
|
|
[
|
|
.BR "\-V\fP, \fB\-\-version"
|
|
]
|
|
[
|
|
.BR "\-c\fP, \fB\-\-copyright"
|
|
]
|
|
[
|
|
.BR "\-v\fP, \fB\-\-verbose"
|
|
]
|
|
[
|
|
.BI "\-i\fP, \fB\-\-icudatadir" " directory"
|
|
]
|
|
.IR " input-file"
|
|
.IR " output\-file"
|
|
.SH DESCRIPTION
|
|
.B gendict
|
|
reads the word list from
|
|
.I dictionary-file
|
|
and creates a string trie dictionary file. Normally this data file has the
|
|
.B .dict
|
|
extension.
|
|
.PP
|
|
Words begin at the beginning of a line and are terminated by the first whitespace.
|
|
Lines that begin with whitespace are ignored.
|
|
.SH OPTIONS
|
|
.TP
|
|
.BR "\-h\fP, \fB\-?\fP, \fB\-\-help"
|
|
Print help about usage and exit.
|
|
.TP
|
|
.BR "\-V\fP, \fB\-\-version"
|
|
Print the version of
|
|
.B gendict
|
|
and exit.
|
|
.TP
|
|
.BR "\-c\fP, \fB\-\-copyright"
|
|
Embeds the standard ICU copyright into the
|
|
.IR output-file .
|
|
.TP
|
|
.BR "\-v\fP, \fB\-\-verbose"
|
|
Display extra informative messages during execution.
|
|
.TP
|
|
.BI "\-i\fP, \fB\-\-icudatadir" " directory"
|
|
Look for any necessary ICU data files in
|
|
.IR directory .
|
|
For example, the file
|
|
.B pnames.icu
|
|
must be located when ICU's data is not built as a shared library.
|
|
The default ICU data directory is specified by the environment variable
|
|
.BR ICU_DATA .
|
|
Most configurations of ICU do not require this argument.
|
|
.TP
|
|
.BR "\fB\-\-uchars"
|
|
Set the output trie type to UChar. Mutually exclusive with
|
|
.BR --bytes.
|
|
.TP
|
|
.BR "\fB\-\-bytes"
|
|
Set the output trie type to Bytes. Mutually exclusive with
|
|
.BR --uchars.
|
|
.TP
|
|
.BR "\fB\-\-transform"
|
|
Set the transform type. Should only be specified with
|
|
.BR --bytes.
|
|
Currently supported transforms are:
|
|
.BR offset-<hex-number>,
|
|
which specifies an offset to subtract from all input characters.
|
|
It should be noted that the offset transform also maps U+200D
|
|
to 0xFF and U+200C to 0xFE, in order to offer compatibility to
|
|
languages that require these characters.
|
|
A transform must be specified for a bytes trie, and when applied
|
|
to the non-value characters in the
|
|
.IR input-file
|
|
must produce output between 0x00 and 0xFF.
|
|
.TP
|
|
.BI " input\-file"
|
|
The source file to read.
|
|
.TP
|
|
.BI " output\-file"
|
|
The file to write the output dictionary to.
|
|
.SH CAVEATS
|
|
The
|
|
.IR input-file
|
|
is assumed to be encoded in UTF-8.
|
|
The integers in the
|
|
.IR input-file
|
|
that are used as values must be made up of ASCII digits. They
|
|
may be specified either in hex, by using a 0x prefix, or in
|
|
decimal.
|
|
Either
|
|
.BI --bytes
|
|
or
|
|
.BI --uchars
|
|
must be specified.
|
|
.SH ENVIRONMENT
|
|
.TP 10
|
|
.B ICU_DATA
|
|
Specifies the directory containing ICU data. Defaults to
|
|
.BR @thepkgicudatadir@/@PACKAGE@/@VERSION@/ .
|
|
Some tools in ICU depend on the presence of the trailing slash. It is thus
|
|
important to make sure that it is present if
|
|
.B ICU_DATA
|
|
is set.
|
|
.SH AUTHORS
|
|
Maxime Serrano
|
|
.SH VERSION
|
|
1.0
|
|
.SH COPYRIGHT
|
|
Copyright (C) 2012 International Business Machines Corporation and others
|
|
.SH SEE ALSO
|
|
.BR http://www.icu-project.org/userguide/boundaryAnalysis.html
|
|
|