db89fd9197
X-SVN-Rev: 7563
1635 lines
64 KiB
HTML
1635 lines
64 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
|
|
<html>
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
|
|
<meta name="COPYRIGHT" content=
|
|
"Copyright (c) IBM Corporation and others. All Rights Reserved.">
|
|
<meta name="KEYWORDS" content=
|
|
"ICU; International Components for Unicode; what's new; readme; read me; introduction; downloads; downloading; building; installation;">
|
|
<meta name="DESCRIPTION" content=
|
|
"The introduction to the International Components for Unicode with instructions on building, installation, usage and other information about ICU.">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
|
|
<title>ReadMe for ICU</title>
|
|
<style type="text/css">
|
|
h1 {border-width: 2px; border-style: solid; text-align: center; width: 100%; font-size: 200%; font-weight: bold}
|
|
h2 {margin-top: 3em; text-decoration: underline; page-break-before: always}
|
|
h2.TOC {page-break-before: auto}
|
|
h3 {margin-top: 2em; text-decoration: underline}
|
|
h4 {text-decoration: underline}
|
|
h5 {text-decoration: underline}
|
|
caption {font-weight: bold; text-align: left}
|
|
div.indent {margin-left: 2em}
|
|
ul.TOC {list-style-type: none}
|
|
samp {margin-left: 2em; border-style: groove; padding: 1em; display: block; background-color: #EEEEEE}
|
|
</style>
|
|
</head>
|
|
|
|
<body lang="en-US">
|
|
<h1>International Components for Unicode<br>
|
|
ICU 2.0 ReadMe</h1>
|
|
|
|
<p>Version: 2002-Feb-04<br>
|
|
Copyright © 1997-2002 International Business Machines Corporation and
|
|
others. All Rights Reserved.</p>
|
|
<!-- Remember that there is a copyright at the end too -->
|
|
<hr>
|
|
|
|
<h2 class="TOC">Table of Contents</h2>
|
|
|
|
<ul class="TOC">
|
|
<li><a href="#Introduction">Introduction</a></li>
|
|
|
|
<li><a href="#GettingStarted">Getting started</a></li>
|
|
|
|
<li><a href="#News">What is new in this release?</a></li>
|
|
|
|
<li><a href="#Download">How to Download the Source Code</a></li>
|
|
|
|
<li><a href="#SourceCode">ICU Source Code Organization</a></li>
|
|
|
|
<li>
|
|
<a href="#HowToBuild">How to Build And Install ICU</a>
|
|
|
|
<ul class="TOC">
|
|
<li><a href="#HowToBuildSupported">Supported Platforms</a></li>
|
|
|
|
<li><a href="#HowToBuildWindows">Windows</a></li>
|
|
|
|
<li><a href="#HowToBuildUnix">Unix</a></li>
|
|
|
|
<li><a href="#HowToBuildOS390">OS/390 (zSeries)</a></li>
|
|
|
|
<li><a href="#HowToBuildOS400">OS/400 (iSeries)</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<a href="#ImportantNotes">Important Notes About Using ICU</a>
|
|
|
|
<ul class="TOC">
|
|
<li><a href="#ImportantNotesWindows">Windows Platform</a></li>
|
|
|
|
<li><a href="#ImportantNotesUnix">Unix Type Platforms</a></li>
|
|
|
|
<li><a href="#ImportantNotesDefaultCP">Using the default
|
|
codepage</a></li>
|
|
|
|
<li><a href="#ImportantNotesDeprecatedAPI">Methods for enabling
|
|
deprecated APIs</a></li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li><a href="#PlatformDependencies">Platform Dependencies</a></li>
|
|
</ul>
|
|
<hr>
|
|
|
|
<h2><a name="Introduction" href="#Introduction">Introduction</a></h2>
|
|
|
|
<p>Today's software market is a global one in which it is desirable to
|
|
develop and maintain one application (single source/single binary) that
|
|
supports a wide variety of languages. The International Components for
|
|
Unicode (C/C++) provides tools to help write platform-independent
|
|
applications that are internationalized and localized, with support
|
|
for:</p>
|
|
|
|
<ul>
|
|
<li>Support for the latest version of the Unicode standard</li>
|
|
|
|
<li>Character set conversions, with support for over 200 codepages</li>
|
|
|
|
<li>Locale data for more than 160 locales</li>
|
|
|
|
<li>Text collation (sorting) based on the Unicode Collation Algorithm
|
|
(=ISO 14651), customizable and tailored for national standards</li>
|
|
|
|
<li>Transliteration services for script<->script transliterations
|
|
and general text operations</li>
|
|
|
|
<li>Resource bundles for storing and accessing localized information</li>
|
|
|
|
<li>Date/Number/Message formatting and parsing of culture-specific
|
|
input/output formats</li>
|
|
|
|
<li>Text boundary analysis for finding characters, word and sentence
|
|
boundaries</li>
|
|
</ul>
|
|
|
|
<p>ICU has a sister project <a href=
|
|
"http://oss.software.ibm.com/icu4j/">ICU4J</a> that extends the
|
|
internationalization capabilities of Java to a level similar to ICU. The
|
|
ICU C/C++ project is also called ICU4C when a distinction is necessary.</p>
|
|
|
|
<h2><a name="#GettingStarted" href="#GettingStarted">Getting
|
|
started</a></h2>
|
|
|
|
<p>This document describes how to build and install ICU on your machine.
|
|
For other information about ICU please see the following table of
|
|
links.<br>
|
|
The ICU homepage also links to related information about writing
|
|
internationalized software.</p>
|
|
|
|
<table border="1" cellpadding="3" width="100%" summary="">
|
|
<caption>
|
|
Here are some useful links regarding ICU and internationalization in
|
|
general.
|
|
</caption>
|
|
|
|
<tr>
|
|
<td>ICU Homepage</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu/">http://oss.software.ibm.com/icu/</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>ICU4J Homepage</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu4j/">http://oss.software.ibm.com/icu4j/</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>FAQ - Frequently Asked Questions about ICU</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu/userguide/icufaq.html">http://oss.software.ibm.com/icu/userguide/icufaq.html</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>ICU User's Guide</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu/userguide/">http://oss.software.ibm.com/icu/userguide/</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Download ICU Releases</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu/download/">http://oss.software.ibm.com/icu/download/</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>API Documentation Online</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu/apiref/">http://oss.software.ibm.com/icu/apiref/</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Online ICU Demos</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu/demo/">http://oss.software.ibm.com/icu/demo/</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Contacts & Bug Reports/Feature Requests</td>
|
|
|
|
<td><a href=
|
|
"http://oss.software.ibm.com/icu/archives/">http://oss.software.ibm.com/icu/archives/</a></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><strong>Important:</strong> Please make sure you understand the <a href=
|
|
"license.html">Copyright and License Information</a>.</p>
|
|
|
|
<h2><a name="News" href="#News">What is new in this release?</a></h2>
|
|
|
|
<p>The following list concentrates on changes that affect existing
|
|
applications migrating from previous ICU releases. For more news about this
|
|
release, see the <a href=
|
|
"http://oss.software.ibm.com/icu/download/2.0/">ICU 2.0 download
|
|
page</a>.</p>
|
|
|
|
<h3>Support for Unicode 3.1.1</h3>
|
|
|
|
<p>ICU 2.0 has been upgraded to support <a href=
|
|
"http://www.unicode.org/unicode/standard/versions/Unicode3.1.1.html">Unicode
|
|
3.1.1</a>, which includes the addition of 44,946 new encoded characters.
|
|
These characters cover several historic scripts, several sets of symbols,
|
|
and a very large collection of additional CJK ideographs.</p>
|
|
|
|
<p>As part of this upgrade, a number of ICU services have been reviewed and
|
|
improved with regards to handling supplementary characters (surrogate
|
|
pairs). Especially, normalization is revamped for support of supplementary
|
|
characters and higher performance.</p>
|
|
|
|
<h3>Euro transition</h3>
|
|
|
|
<p>Locale data for countries that are switching their national currencies
|
|
to the Euro is updated to use the Euro symbol and appropriate currency
|
|
formatting. The old data is available in _PREEURO locale variants. The
|
|
_EURO variant selector can still be used to unambiguously get Euro currency
|
|
symbol formatting. For some time around the transition, software should
|
|
explicitly specify _PREEURO and _EURO variants to make sure to get the
|
|
intended currency format.</p>
|
|
|
|
<p>For more on this topic see the <a href=
|
|
"http://www.ibm.com/developerworks/unicode/library/u-euro/">developerWorks
|
|
article "Are you really ready for the Euro?"</a>.</p>
|
|
|
|
<h3>API changes</h3>
|
|
|
|
<p>Functions that take C-style string input arguments with const UChar *src
|
|
and int32_t srcLength now consistently treat srcLength==-1 to mean that the
|
|
input string is NUL-terminated and get srcLength=u_strlen(src).</p>
|
|
|
|
<p>Functions that take C-style string output arguments with UChar *dest and
|
|
int32_t destCapacity now handle NUL-termination of the output string
|
|
consistently. If the output length is equal to destCapacity, then dest is
|
|
filled with the output string and a warning code is set. For details about
|
|
string handling see the <a href=
|
|
"http://oss.software.ibm.com/icu/userguide/strings.html">User's Guide
|
|
Strings chapter</a>.</p>
|
|
|
|
<p>Some APIs have been <i>deprecated</i> for a long time (more than a year)
|
|
and have been removed now.<br>
|
|
Some other APIs have been marked as <i>deprecated</i> because they are
|
|
replaced by improved APIs; the newly deprecated APIs will be available for
|
|
another year. In particular, the C++ classes UnicodeConverter, Unicode, and
|
|
BiDi are deprecated in favor of the equally powerful C APIs.<br>
|
|
A few <i>draft</i> APIs have changed, especially for transliteration.</p>
|
|
|
|
<p>APIs that take a rules or pattern string (for collation,
|
|
transliteration, message formats, etc.) now also take a
|
|
<code>UParseError</code> structure that is filled with useful debugging
|
|
information when a rule syntax error is detected. This makes it easier in
|
|
large rules to find problems. As a result, the signatures of some functions
|
|
have changed. The old signatures will be available for about a year by
|
|
#defining a constant. See affected header files for details.</p>
|
|
|
|
<p>The C++ Normalizer class had a partially broken model for iterative
|
|
normalization; this is redone in a more consistent way. See the <a href=
|
|
"http://oss.software.ibm.com/icu/apiref/class_Normalizer.html">Normalizer
|
|
API documentation</a> for details.</p>
|
|
|
|
<h3>Memory and resource cleanup</h3>
|
|
|
|
<p>ICU is carefully tested for memory leaks. Some memory is held in
|
|
internal caches that do not normally get released during normal operation.
|
|
These are not leaks because ICU continues to use them as necessary.</p>
|
|
|
|
<p>For testing purposes (for memory leaks) and for a small number of
|
|
applications it can be useful to close all the memory that is allocated for
|
|
a library. ICU 2.0 supports this with a new function <code><a href=
|
|
"http://oss.software.ibm.com/icu/apiref/uclean_h.html">u_cleanup()</a></code>
|
|
that may be called after an application has released all ICU objects.
|
|
<code>u_cleanup()</code> will then release all of ICU's internal memory.
|
|
The ICU libraries can then even be unloaded cleanly without shutting down
|
|
the process.</p>
|
|
|
|
<h3>ICU versioning - C++ namespaces</h3>
|
|
|
|
<p>Beginning with ICU 2.0, multiple releases of ICU can be used in the same
|
|
process. Together with an arbitrary number of post-2.0 releases, one
|
|
pre-2.0 release can be loaded and active.</p>
|
|
|
|
<p>This is achieved by renaming all library exports to include a release
|
|
number suffix. Each global function and each class is renamed in this way
|
|
using a header file with #defines. For C++, if the compiler supports
|
|
namespaces, all ICU C++ classes are defined in the "icu" namespace. If the
|
|
compiler does not support namespaces, then the classes are renamed instead.
|
|
This change also reduces the chance of naming collisions with other
|
|
libraries.</p>
|
|
|
|
<p>For details see the <a href=
|
|
"http://oss.software.ibm.com/icu/userguide/design.html">User's Guide Design
|
|
Chapter</a>.</p>
|
|
|
|
<h3>Data loading changed</h3>
|
|
|
|
<p>ICU data loading is simplified for most users. By default, the ICU build
|
|
creates a DLL/shared library that is linked directly with the common
|
|
library (<code>[lib]icuuc</code>). By placing all ICU libraries including
|
|
the data library into the same folder, ICU should start up and find its
|
|
data immediately. Dynamic loading of data from DLLs/shared libraries is not
|
|
supported any more.</p>
|
|
|
|
<p>Before ICU 2.0, ICU did not itself link directly with its data library,
|
|
but some ICU applications did (like the Xerces XML parser) and called
|
|
<code>udata_setCommonData()</code>. This is not necessary any more in the
|
|
default case.<br>
|
|
On the other hand, this same technique can now be used to efficiently load
|
|
application data (e.g., for its own localization). An application can build
|
|
a data DLL/library of its own, link it, and call the new API
|
|
<code>udata_setAppData()</code>.</p>
|
|
|
|
<p>For details on finding and loading ICU data and on options for portable,
|
|
common data files etc. see the <a href=
|
|
"http://oss.software.ibm.com/icu/userguide/icudata.html">User's Guide ICU
|
|
Data Chapter</a>.</p>
|
|
|
|
<h3>Collation improvements</h3>
|
|
|
|
<p>The performance of Japanese Katakana collation is improved, and the
|
|
Japanese collation is changed for conformance with the JIS X 4061 standard.
|
|
The improvement is in the handling of the length and iteration marks,
|
|
making the processing of regular letters faster.</p>
|
|
|
|
<p>The JIS X 4061 standard specifies a 5-level sorting algorithm. Sorting
|
|
with all five levels according to JIS is achieved in ICU 2.0 with the
|
|
"identical" strength. The fifth level distinguishes regular character codes
|
|
from compatibility variants.</p>
|
|
|
|
<p>There is special code to handle the fourth (quarternary) level of the
|
|
JIS standard, which distinguishes between Hiragana and Katakana letters. In
|
|
ICU 2.0 string comparisons (like ucol_strcoll), when using the "shifted"
|
|
option, this is slow because it generates complete sort keys for both
|
|
strings. This is not an issue if the "shifted" option is not used, or if
|
|
the string comparison is done with fewer levels.</p>
|
|
|
|
<p>Quarternary strength, without the "shifted" option, is the default for
|
|
Japanese collation in ICU 2.0.</p>
|
|
|
|
<p>Three-level sorting (tertiary strength) and lower — if sufficient
|
|
— is faster even with "shifted" on (for string comparisons:
|
|
<em>much</em> faster in this case).</p>
|
|
|
|
<h3>License Change (for ICU 1.8.1 and up)</h3>
|
|
|
|
<p>The ICU projects (ICU4C and ICU4J) have changed their licenses from the
|
|
IPL (IBM Public License) to the X license. The X license is a non-viral and
|
|
recommended free software license that is compatible with the GNU GPL
|
|
license. This is effective starting with release 1.8.1 of ICU4C and release
|
|
1.3.1 of ICU4J. All previous ICU releases will continue to utilize the IPL.
|
|
New ICU releases will adopt the X license. The users of previous releases
|
|
of ICU will need to accept the terms and conditions of the X license in
|
|
order to adopt the new ICU releases.</p>
|
|
|
|
<p>The main effect of the change is to provide GPL compatibility. The X
|
|
license is listed as GPL compatible, see the gnu page at <a href=
|
|
"http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses">http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses</a>.</p>
|
|
|
|
<p>The text of the X license is available at <a href=
|
|
"http://www.x.org/terms.htm">http://www.x.org/terms.htm</a>. The IBM
|
|
version contains the essential text of the license, omitting the X-specific
|
|
trademarks and copyright notices.</p>
|
|
|
|
<p>For more details please see the <a href=
|
|
"http://oss.software.ibm.com/icu/press.html">press announcement</a> and the
|
|
<a href="http://oss.software.ibm.com/icu/project_faq.html#license">Project
|
|
FAQ</a>.</p>
|
|
|
|
<h3>Transliterator improvements</h3>
|
|
|
|
<p>The transliterator service has undergone an extensive overhaul, in both
|
|
the rule-based engine and the built-in system rules. For a complete
|
|
description see the <a href=
|
|
"http://oss.software.ibm.com/icu/userguide/Transliteration.html">User's
|
|
Guide chapter on transliteration</a>.</p>
|
|
|
|
<ul>
|
|
<li><b>New or rewritten rules:</b> <tt>Any-Accents</tt>,
|
|
<tt>Any-Publishing</tt>, <tt>Cyrillic-Latin</tt>*, <tt>Greek-Latin</tt>*,
|
|
<tt>Greek-Latin/UNGEGN</tt> (aka <tt>el-Latin</tt>),
|
|
<tt>Hiragana-Latin</tt>*, and <tt>Latin-Katakana</tt>*. New algorithmic
|
|
rules include <tt>Any-Name</tt>*, the normalization rules
|
|
<tt>Any-NFC</tt>, <tt>Any-NFKC</tt>, <tt>Any-NFD</tt>, and
|
|
<tt>Any-NFKD</tt>, casing rules <tt>Any-Upper</tt>, <tt>Any-Lower</tt>,
|
|
and <tt>Any-Title</tt>. <tt>Unicode-Hex</tt>* has been renamed
|
|
<tt>Any-Hex</tt>*. <tt>Any-Remove</tt> deletes its input. [*<em>applies
|
|
to reverse rule as well</em>]</li>
|
|
|
|
<li><b>Indic script rules:</b> Transliterators between Indic scripts and
|
|
from each script to and from Latin have been completely revised. Scripts
|
|
included are Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam,
|
|
Oriya, Tamil, and Telugu. Taking Bengali as an example, transliterators
|
|
<tt>Bengali-X</tt> and <tt>X-Bengali</tt> exist, where X is any of the
|
|
other listed Indic scripts, or Latin.</li>
|
|
|
|
<li><b>Deleted rules:</b> <tt>UnicodeName-UnicodeChar</tt> has been
|
|
replaced by <tt>Any-Name</tt>*. <tt>Latin-Arabic</tt>* and
|
|
<tt>Latin-Hebrew</tt>* have been removed until they can be rewritten.
|
|
<tt>KeyboardEscape-Latin1</tt> has been replaced by <tt>Any-Accents</tt>
|
|
and <tt>Any-Publishing</tt>. <tt>Latin-Kana</tt>* has been replaced by
|
|
<tt>Latin-Katakana</tt>* and <tt>Latin-Hiragana</tt>*. [*<em>applies to
|
|
reverse rule as well</em>]</li>
|
|
|
|
<li><b>ID syntax changes:</b> Transliterator IDs ignore case and
|
|
whitespace now. They now have the standard form
|
|
<em>[filter]source-target/variant</em>. The "<em>[filter]</em>" element
|
|
is optional; if present, it limits the characters that the transliterator
|
|
operates on. The "<em>source-</em>" element is optional; if omitted, it
|
|
is taken to be <tt>Any</tt>. The "<em>/variant</em>" element is also
|
|
optional; if present, it selects between different flavors of a related
|
|
set of transliterators, for example, <tt>Greek-Latin</tt> and
|
|
<tt>Greek-Latin/UNGEGN</tt>. The source, target, and variant specifiers
|
|
are case-insensitive strings of the form
|
|
<tt>/[_[:L:]][_[:L:][:N:]]*/</tt>.</li>
|
|
|
|
<li>
|
|
<b>Locale support:</b> The source, target, or both may be locales. In
|
|
this case the transliterator rules will be looked up in the system
|
|
locale resource bundles. Rules are sought under three tags, listed
|
|
below. The text after the underscore in each tag is always
|
|
canonicalized to uppercase before lookup. <em>Note: The underscore is
|
|
currently omitted from ICU4C tags, but will be restored when
|
|
possible.</em>
|
|
|
|
<ul>
|
|
<li><tt>TransliterateTo_<em>SCRIPT</em></tt>: Unidirectional rules
|
|
from the enclosing locale to another script or specifier.</li>
|
|
|
|
<li><tt>TransliterateFrom_<em>SCRIPT</em></tt>: Unidirectional rules
|
|
from another script or specifier to the enclosing locale.</li>
|
|
|
|
<li><tt>Transliterate_<em>SCRIPT</em></tt>: Bidirectional rules, with
|
|
the forward direction being To and the reverse direction being
|
|
From.</li>
|
|
</ul>
|
|
Lookup proceeds in the following order:
|
|
|
|
<ul>
|
|
<li>In the dynamic registry: <em>source-target</em></li>
|
|
|
|
<li>In the <em>source</em> locale:
|
|
<tt>TransliterateTo_<em>TARGET</em></tt> then
|
|
<tt>Transliterate_<em>TARGET</em></tt> (forward direction)</li>
|
|
|
|
<li>In the <em>target</em> locale:
|
|
<tt>TransliterateFrom_<em>SOURCE</em></tt> then
|
|
<tt>Transliterate_<em>SOURCE</em></tt> (reverse direction)</li>
|
|
</ul>
|
|
If either the source or target specifier is not a locale then the
|
|
corresponding locale lookup is skipped. If either is a locale, then
|
|
locale fallback from <tt>aa_BB_CCC</tt> to <tt>aa_BB</tt> to
|
|
<tt>aa</tt> is performed (where <tt>aa</tt>, <tt>BB</tt>, and
|
|
<tt>CCC</tt> are the locale language, country, and variant). The final
|
|
fallback is from the specifier, whether it is a locale or not (e.g.,
|
|
script abbreviation), to the long script name associated with that
|
|
specifier. If a tag lookup succeeds, the attached element should be a
|
|
string array of <i>2n</i> items where <i>n</i> >= 1. Each pair of
|
|
strings is a variant name and rule string. The variants are matched
|
|
against the requested variant. If no variant is specified then the
|
|
first variant is considered to match.
|
|
</li>
|
|
|
|
<li><b>Filters on compounds IDs:</b> A filter on a compound
|
|
transliterator can now be specified by giving a leading entry that
|
|
contains a filter and no transliterator ID. For example, "<tt>[abc];
|
|
Latin-Katakana; Katakana-Hiragana</tt>" submits only the characters
|
|
contained in the UnicodeSet <tt>[abc]</tt> to the compound transliterator
|
|
<tt>Latin-Katakana; Katakana-Hiragana</tt>.</li>
|
|
|
|
<li><b>Explicit reverse IDs:</b> Typically if a transliterator
|
|
<tt>A-B</tt> is formed, and its inverse is requested, the system tries to
|
|
create <tt>B-A</tt>. That is, the source and target are exchanged. In
|
|
some cases, the user may wish a different transliterator to be considered
|
|
the reverse. In order to do this, the reverse ID is specified in
|
|
parentheses immediately following the ID. For example, "<tt>A-B
|
|
(B-C)</tt>" is a transliterator <tt>A-B</tt> whose inverse is
|
|
<tt>B-C</tt>. If the ID of the inverse is requested, "<tt>B-C (A-B)</tt>"
|
|
is returned. The forward or reverse component may be empty, so
|
|
"<tt>(B-C)</tt>" and "<tt>A-B()</tt>" are legal IDs with <tt>Null</tt>
|
|
transliterator for the forward and reverse direction, respectively. This
|
|
is most useful in compounds where one element has no inverse or where a
|
|
different inverse from the standard inverse is desired. For example,
|
|
"<tt>Any-Lower(); Latin-Cyrillic</tt>".</li>
|
|
|
|
<li><b>Quantifiers:</b> Transliterator rules may now contain quantifiers
|
|
'<tt>*</tt>', '<tt>+</tt>', and '<tt>?</tt>'. These indicate zero or
|
|
more, one or more, and zero or one matches, respectively. Quantifiers
|
|
apply to the last element, be it a single character, a UnicodeSet, a
|
|
segment definition, or a quote; the entire preceding element is repeated.
|
|
Quantifiers are implemented as greedy, non-backtracking matchers, unlike
|
|
their typical implementation in regular expressions. As a result,
|
|
expressions that match in a traditional regular expression engine (e.g.,
|
|
Perl) will not match in transliterator. E.g., "[a-z]+ q > x;" will
|
|
<em>not</em> match "abcq", since the '<tt>+</tt>' quantifier consumes all
|
|
four characters.</li>
|
|
|
|
<li><b>Dot character:</b> A new special character is recognized in rules,
|
|
'<tt>.</tt>' (U+0020). This character matches any characters in the set
|
|
<tt>[^[:Zp:][:Zl:]\r\n$]</tt>. Note the trailing '<tt>$</tt>' in the set
|
|
pattern, which indicates that the ETHER character is <em>not</em> matched
|
|
by '<tt>.</tt>'.</li>
|
|
|
|
<li><b>::ID blocks in rules:</b> Transliterator IDs may now be included
|
|
in rule sets. These may occur in two locations: as one contiguous block
|
|
before any other rules, and as one contiguous block after all rules. The
|
|
effect of placing <tt>::ID</tt>s into a rule set is to enclose the
|
|
rule-based transliterator within a compound transliterator containing the
|
|
indicated IDs. The <tt>::ID</tt> syntax is exactly the same as the
|
|
standard ID syntax, with the difference that each ID element is preceded
|
|
by the special token "<tt>::</tt>".</li>
|
|
|
|
<li><b>Segment definitions more flexible:</b> Segment definitions may be
|
|
nested and are now unlimited in number. Prior to 2.0, segments could not
|
|
be nested and were limited to nine ($1 to $9).</li>
|
|
|
|
<li><b>Variable range pragma:</b> A new pragma is supported. This follows
|
|
the syntax:<code>use variable range 0xE800 0xEFFF;</code> (Any two code
|
|
points may be specified.) The code points are specified as decimal
|
|
constants, octal constants with a leading '0', or hexadecimal constants
|
|
with a leading "0x". The given range is used internally for stand-in
|
|
characters during processing. The default range is <b>0xF000..0xF8FF</b>.
|
|
If a rule set explicitly uses characters in the default variable range, a
|
|
new range, not containing any characters in use in the rule set, must be
|
|
specified. <em>Note:</em> This is the first of several planned
|
|
pragmas.</li>
|
|
|
|
<li><b>Factory method registration:</b> Factory methods (function
|
|
pointers in ICU4C; functor objects in ICU4J) may be registered against
|
|
transliterator IDs. This is generally more efficient than the
|
|
registration of singleton prototypes, since no actual transliterator
|
|
object need be created until the user requires one. See the
|
|
<tt>registerFactory()</tt> method in <tt>Transliterator</tt>.</li>
|
|
|
|
<li><b>Filtering semantics changed for subclasses:</b> Subclasses now
|
|
need not concern themselves with filters. Instead, they may assume that
|
|
all characters received by <tt>handleTransliterate()</tt> have already
|
|
passed through the filter. This simplifies subclass code greatly.</li>
|
|
</ul>
|
|
|
|
<h3><a name="NewsUnicodeSet">UnicodeSet Improvements</a></h3>
|
|
|
|
<ul>
|
|
<li><b><tt>[:Any:]</tt> set:</b> The set <tt>[:Any:]</tt> matches all
|
|
Unicode code points, that is, U+0000..U+10FFFF.</li>
|
|
|
|
<li><b><tt>\p{}</tt> syntax:</b> UnicodeSet now recognizes a Perlish
|
|
syntax for character properties. Any property designated as
|
|
<tt>[:Foo:]</tt> may equivalently be designated <tt>\p{Foo}</tt>.</li>
|
|
|
|
<li><b>Short, medium, and long property names:</b> In addition to the
|
|
short property names, such as <tt>[:Ll:]</tt>, equivalent medium (e.g.,
|
|
<tt>[:gc=Ll:]</tt>) and long (e.g.,
|
|
<tt>[:GeneralCategory=LowercaseLetter:]</tt>) forms are recognized. See
|
|
the <a href=
|
|
"http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/unicodeset_properties.html">
|
|
UnicodeSet Properties design document</a> for details. As of this
|
|
release, general categories, numeric value, and script are
|
|
supported.</li>
|
|
</ul>
|
|
<hr>
|
|
|
|
<h2><a name="Download" href="#Download">How to Download the Source
|
|
Code</a></h2>
|
|
|
|
<p>There are two ways to download ICU releases:</p>
|
|
|
|
<ul>
|
|
<li><strong>Official Release Snapshot:</strong><br>
|
|
If you want to use ICU (as opposed to developing it), you should
|
|
download an official packaged version of the ICU source code. These
|
|
versions are tested more thoroughly than day-to-day development builds of
|
|
the system, and they are packaged in zip and tar files for convenient
|
|
download. These packaged files can be found at <a href=
|
|
"http://oss.software.ibm.com/icu/download/">http://oss.software.ibm.com/icu/download/</a>.<br>
|
|
|
|
The packaged snapshots are named <strong>icu-nnnn.zip</strong> or
|
|
<strong>icu-nnnn.tgz</strong>, where nnnn is the version number. The .zip
|
|
file is used for Windows platforms, while the .tgz file is preferred on
|
|
most other platforms.<br>
|
|
Please unzip this file. It will reconstruct the source directory,
|
|
including anonymous CVS control directories (see below).</li>
|
|
|
|
<li><strong>CVS Source Repository:</strong><br>
|
|
If you are interested in developing features, patches, or bug fixes for
|
|
ICU, you should probably be working with the latest version of the ICU
|
|
source code. You will need to check the code out of our CVS repository to
|
|
ensure that you have the most recent version of all of the files. See our
|
|
<a href="http://oss.software.ibm.com/icu/develop/cvs.html">CVS page</a>
|
|
for details.</li>
|
|
</ul>
|
|
|
|
<h2><a name="SourceCode" href="#SourceCode">ICU Source Code
|
|
Organization</a></h2>
|
|
|
|
<p>In the descriptions below, <strong><i><ICU></i></strong> is the
|
|
full path name of the icu directory - the top level directory from the
|
|
distribution archives - in your file system.</p>
|
|
|
|
<table border="1" cellpadding="0" width="100%" summary="">
|
|
<caption>
|
|
The following files describe the code drop.
|
|
</caption>
|
|
|
|
<tr>
|
|
<td>readme.html</td>
|
|
|
|
<td>Describes the International Components for Unicode (this file)</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>license.html</td>
|
|
|
|
<td>Contains the text of the ICU license</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><br>
|
|
</p>
|
|
|
|
<table border="1" cellpadding="0" width="100%" summary="">
|
|
<caption>
|
|
The following directories contain source code and data files.
|
|
</caption>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/common/</td>
|
|
|
|
<td>The core Unicode and support functionality, such as resource
|
|
bundles, character properties, locales, codepage conversion,
|
|
normalization, Unicode properties, Locale, and UnicodeString.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/i18n/</td>
|
|
|
|
<td>Modules in i18n are generally the more data-driven, that is to say
|
|
resource bundle driven, components. These deal with higher level
|
|
internationalization issues such as formatting, collation, text break
|
|
analysis, and transliteration.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/test/intltest/</td>
|
|
|
|
<td>A test suite including all C++ APIs. For information about running
|
|
the test suite, see the users' guide.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/test/cintltst/</td>
|
|
|
|
<td>A test suite written in C, including all C APIs. For information
|
|
about running the test suite, see the users' guide.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/data/</td>
|
|
|
|
<td>
|
|
This directory contains the source data in text format, which is
|
|
compiled into binary form during the ICU build process. The output
|
|
from these files is stored in <i><ICU></i>/source/data/build
|
|
while awaiting further packaging.
|
|
|
|
<ul>
|
|
<li><b>unidata/</b> This directory contains the Unicode data files.
|
|
Please see <a href=
|
|
"http://www.unicode.org/">http://www.unicode.org/</a> for more
|
|
information.</li>
|
|
|
|
<li>
|
|
<p><b>Resource Bundle sources</b> .txt files containing ICU
|
|
language and culture-specific localization data. Two special
|
|
bundles are <b>root</b>, which is the fallback data and parent of
|
|
other bundles, and <b>index</b> which contains a list of
|
|
installed bundles. <b>resfiles.txt</b> contains the list of
|
|
resource bundle files.</p>
|
|
|
|
<p>Also here are transliteration bundles, and the list of
|
|
installed transliteration files in <b>translit_index.txt</b>.</p>
|
|
|
|
<p>All resource bundles are compiled into .res files. The
|
|
<b>ucmfiles.txt</b> file contains the list of converter
|
|
files.</p>
|
|
</li>
|
|
|
|
<li><b>Code page converter tables</b> .ucm files containing
|
|
mappings to and from Unicode. These are compiled into .cnv
|
|
files.</li>
|
|
|
|
<li><b>convrtrs.txt</b> is the alias mapping table from various
|
|
converter name formats to ICU internal format and vice versa. It
|
|
produces cnvalias.dat.</li>
|
|
|
|
<li><b>timezone.txt</b> is a generated file which is compiled into
|
|
tz.dat, containing time zone information.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/data</td>
|
|
|
|
<td>This directory is where the final, packaged version of the ICU
|
|
binary data ends up. The intermediate individual data files (.res,
|
|
.cnv) are kept in the subdirectory
|
|
"<i><ICU></i>/source/data/build" prior to packaging.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/tools</td>
|
|
|
|
<td>Tools for generating the data files. Data files are generated by
|
|
invoking <i><ICU></i>/source/data/build/makedata.bat on Win32 or
|
|
<i><ICU></i>/source/make on Unix.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/samples</td>
|
|
|
|
<td>Various sample programs that use ICU</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/extra</td>
|
|
|
|
<td>Non-supported API additions. Currently, it contains the 'ustdio'
|
|
file i/o library</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/layout</td>
|
|
|
|
<td>Contains the ICU layout engine (not a rasterizer).</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/packaging<br>
|
|
<i><ICU></i>/debian</td>
|
|
|
|
<td>These directories contain scripts and tools for packaging the final
|
|
ICU build for various release platforms.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/config</td>
|
|
|
|
<td>Contains helper makefiles for platform specific build commands.
|
|
Used by 'configure'.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><i><ICU></i>/source/allinone</td>
|
|
|
|
<td>Contains top-level ICU project files, for instance to build all of
|
|
ICU under one MSVC project.</td>
|
|
</tr>
|
|
</table>
|
|
<!-- end of ICU structure ==================================== -->
|
|
|
|
<h2><a name="HowToBuild" href="#HowToBuild">How To Build And Install
|
|
ICU</a></h2>
|
|
|
|
<h3><a name="HowToBuildSupported" href="#HowToBuildSupported">Supported
|
|
Platforms</a></h3>
|
|
|
|
<table border="1" cellpadding="3" summary="">
|
|
<caption>
|
|
Here is a status of functionality of ICU on several different
|
|
platforms.
|
|
</caption>
|
|
|
|
<tr>
|
|
<th>Operating system</th>
|
|
|
|
<th>Compiler</th>
|
|
|
|
<th>Testing frequency</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Windows 98/NT/2000</td>
|
|
|
|
<td>Microsoft Visual C++ 6.0</td>
|
|
|
|
<td>Reference platform</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Red Hat Linux 6.1</td>
|
|
|
|
<td>gcc 2.95.2</td>
|
|
|
|
<td>Reference platform</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>AIX 4.3.3</td>
|
|
|
|
<td>xlC 3.6.4</td>
|
|
|
|
<td>Reference platform</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Solaris 2.6</td>
|
|
|
|
<td>Workshop Pro CC 4.2</td>
|
|
|
|
<td>Reference platform</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>HP/UX 11.01</td>
|
|
|
|
<td>aCC A.12.10</td>
|
|
|
|
<td>Reference platform</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>AIX 5.1.0 L</td>
|
|
|
|
<td>Visual Age C++ 5.0</td>
|
|
|
|
<td>Regularly tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Solaris 2.7</td>
|
|
|
|
<td>Workshop Pro CC 6.0</td>
|
|
|
|
<td>Regularly tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Solaris 2.6</td>
|
|
|
|
<td>gcc 2.91.66</td>
|
|
|
|
<td>Regularly tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>FreeBSD 4.4</td>
|
|
|
|
<td>gcc 2.95.3</td>
|
|
|
|
<td>Regularly tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>HP/UX 11.01</td>
|
|
|
|
<td>CC A.03.10</td>
|
|
|
|
<td>Regularly tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>OS/390 (zSeries)</td>
|
|
|
|
<td>CC r10</td>
|
|
|
|
<td>Regularly tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>AS/400 (iSeries) V5R1</td>
|
|
|
|
<td>iCC</td>
|
|
|
|
<td>Rarely tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>NetBSD, OpenBSD</td>
|
|
|
|
<td> </td>
|
|
|
|
<td>Rarely tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>SGI/IRIX</td>
|
|
|
|
<td> </td>
|
|
|
|
<td>Rarely tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>PTX</td>
|
|
|
|
<td> </td>
|
|
|
|
<td>Rarely tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>OS/2</td>
|
|
|
|
<td>Visual Age</td>
|
|
|
|
<td>Rarely tested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Macintosh</td>
|
|
|
|
<td> </td>
|
|
|
|
<td>Needs help to port</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><br>
|
|
</p>
|
|
|
|
<p><strong>Key to testing frequency</strong></p>
|
|
|
|
<dl>
|
|
<dt><i>Reference platform</i></dt>
|
|
|
|
<dd>ICU will work on these platforms with these compilers</dd>
|
|
|
|
<dt><i>Regularly tested</i></dt>
|
|
|
|
<dd>ICU should work on these platforms with these compilers</dd>
|
|
|
|
<dt><i>Rarely tested</i></dt>
|
|
|
|
<dd>ICU has been ported to these platforms but may not have been tested
|
|
there recently</dd>
|
|
</dl>
|
|
|
|
<h3><a name="HowToBuildWindows" href="#HowToBuildWindows">How To Build And
|
|
Install On Windows</a></h3>
|
|
|
|
<p>Building International Components for Unicode requires:</p>
|
|
|
|
<ul>
|
|
<li>Microsoft NT 4.0 and above, or Windows 98 and above</li>
|
|
|
|
<li>Microsoft Visual C++ 6.0 (Service Pack 2 is required to work with the
|
|
release build of max speed optimization).</li>
|
|
</ul>
|
|
|
|
<p>The steps are:</p>
|
|
|
|
<ol>
|
|
<li>Unzip the icu-XXXX.zip file into any convenient location. Using
|
|
command line zip, type "unzip -a icu-XXXX.zip -d drive:\directory", or
|
|
just use WinZip.</li>
|
|
|
|
<li>Be sure that the ICU binary directory, <i><ICU></i>\bin\, is
|
|
included in the <strong>PATH</strong> environment variable. The tests
|
|
will not work without the location of the ICU dll files in the path.</li>
|
|
|
|
<li>Set the <strong>TZ</strong> environment variable to
|
|
<strong>PST8PDT</strong>. The tests will not work in any other
|
|
timezone.</li>
|
|
|
|
<li>Open the "<i><ICU></i>\source\allinone\allinone.dsw" workspace
|
|
file in Microsoft Visual C++ 6.0. (This workspace includes all the
|
|
International Components for Unicode libraries, necessary ICU building
|
|
tools, and the intltest and cintltest test suite projects). Please see
|
|
the note below if you want to build from the command line instead.</li>
|
|
|
|
<li>Set the active Project to the "all" project. To do this: Choose
|
|
"Project" menu, and select "Set active project". In the submenu, select
|
|
the "all" workspace.</li>
|
|
|
|
<li>Set the active configuration to "Win32 Debug" or "Win32 Release" (See
|
|
<a href="#HowToBuildWindowsConfig">note</a> below).</li>
|
|
|
|
<li>Choose the "Build" menu and select "Rebuild All". If you want to
|
|
build the Debug and Release at the same time, see the <a href=
|
|
"#HowToBuildWindowsBatch">note</a> below.</li>
|
|
|
|
<li>Run the C++ test suite, "intltest". To do this: set the active
|
|
project to "intltest", and press F5 to run it.</li>
|
|
|
|
<li>Run the C test suite, "cintltst". To do this: set the active project
|
|
to "cintltst", and press F5 to run it.</li>
|
|
|
|
<li>Make sure that both "cintltst" and "intltest" passed without any
|
|
errors. The return codes are non-zero when they do not pass. Visual C++
|
|
will display the return codes in the debug tag of the output window. When
|
|
"intltest" and "cintltest" return 0, it means that everything is
|
|
installed correctly. You can press Ctrl+F5 on the test project to run the
|
|
test and see what error messages were displayed (if any tests
|
|
failed).</li>
|
|
|
|
<li>Reset the <strong>TZ</strong> environment variable to its original
|
|
value, unless you plan on testing ICU any further.</li>
|
|
|
|
<li>You are now able to develop applications with ICU.</li>
|
|
</ol>
|
|
|
|
<p><a name="HowToBuildWindowsCommandLine"><strong>Using MSDEV At The
|
|
Command Line Note:</strong></a> You can build ICU from the command line.
|
|
Assuming that you have properly installed Microsoft Visual C++ to support
|
|
command line execution, you can run the following command, 'msdev
|
|
<i><ICU></i>\source\allinone\allinone.dsw /MAKE "ALL"'.</p>
|
|
|
|
<p><a name="HowToBuildWindowsConfig"><strong>Setting Active Configuration
|
|
Note:</strong></a> To set the active configuration, two different
|
|
possibilities are:</p>
|
|
|
|
<ul>
|
|
<li>Choose "Build" menu, select "Set Active Configuration", and select
|
|
"Win32 Release" or "Win32 Debug".</li>
|
|
|
|
<li>Another way is to select "Customize" in the "Tools" menu, select the
|
|
"Toolbars" tab, enable "Build" instead of "Build Minibar", and click on
|
|
"Close". This will bring up a toolbar which you can move aside the other
|
|
permanent toolbars at the top of the MSVC window. The advantage is that
|
|
you now have an easy-to-reach pop-up menu that will always show the
|
|
currently selected active configuration. Or, you can drag the project and
|
|
configuration selections and drop them on the menu bar for later
|
|
selection.</li>
|
|
</ul>
|
|
|
|
<p><a name="HowToBuildWindowsBatch"><strong>Batch Configuration
|
|
Note:</strong></a> If you want to build the Debug and Release
|
|
configurations at the same time, choose "Build" menu and select "Batch
|
|
Build..." instead (and mark all configurations as checked), then click the
|
|
button named "Rebuild All". The "all" workspace will build all the test
|
|
programs as well as the tools for generating binary locale data files. The
|
|
"makedata" project will be run automatically to convert the locale data
|
|
files from text format into icudata.dll.</p>
|
|
|
|
<h3><a name="HowToBuildUnix" href="#HowToBuildUnix">How To Build And
|
|
Install On Unix</a></h3>
|
|
|
|
<p>Building International Components for Unicode on Unix requires:</p>
|
|
|
|
<p>A UNIX C++ compiler, (gcc, cc, xlc_r, etc...) installed on the target
|
|
machine. A recent version of GNU make (3.7+). For a list of OS/390 tools
|
|
please view the <a href="#HowToBuildOS390">OS/390 build section</a> of this
|
|
document for further details.</p>
|
|
|
|
<p>The steps are:</p>
|
|
|
|
<ol>
|
|
<li>Decompress the icuXXXX.tar (or icuXXXX.tgz) file. For example,
|
|
<tt>gunzip -d < icuXXXX.tgz | tar xvf -</tt></li>
|
|
|
|
<li>Change directory to the "icu/source".</li>
|
|
|
|
<li>chmod +x runConfigureICU install-sh</li>
|
|
|
|
<li>Run the <a href="source/runConfigureICU">runConfigureICU</a> script
|
|
for your platform. If you are not using the runConfigureICU script or
|
|
your platform is not supported by the script, you need to set your CC,
|
|
CXX, CFLAGS and CXXFLAGS environment variables, and type "./configure".
|
|
You can type "./configure --help" to print the available options.</li>
|
|
|
|
<li>Type "gmake" to compile the libraries and all the data files.</li>
|
|
|
|
<li>
|
|
Optionally, type "gmake check" to verify the test suite.
|
|
|
|
<ul>
|
|
<li>
|
|
<b>Note:</b> You may have to set certain variables if you with to
|
|
run test programs individually, that is apart from "make check".
|
|
The <strong>TZ</strong> environment variable needs to be set to
|
|
<strong>PST8PDT</strong>. Also, the environment variable
|
|
<strong>ICU_DATA</strong> must be set to the full pathname of the
|
|
data directory, to indicate where the locale data files and
|
|
conversion mapping tables are. The trailing "/" is required after
|
|
the directory name (e.g. "$Root/source/data/" will work, but the
|
|
value "$Root/source/data" is not acceptable).
|
|
|
|
<p>When running samples or other applications, ICU_DATA only needs
|
|
to be set if the data is not installed (such as via 'make install')
|
|
into the default location.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>Type "gmake install" to install.</li>
|
|
</ol>
|
|
|
|
<p>Some platforms use package management tools to control the installation
|
|
and uninstallation of files on the system, as well as the integrity of the
|
|
system configuration. You may want to check if ICU can be packaged for your
|
|
package management tools by looking into the "packaging" directory. (Please
|
|
note that if you are using a snapshot of ICU from CVS, it is probable that
|
|
the packaging scripts or related files are not up to date with the contents
|
|
of ICU at this time, so use them with caution.)</p>
|
|
|
|
<h3><a name="HowToBuildOS390" href="#HowToBuildOS390">OS/390 (zSeries)
|
|
Platform</a></h3>
|
|
|
|
<p>If you are building on the OS/390 UNIX System Services platform, it is
|
|
important that you understand a few details:</p>
|
|
|
|
<ul>
|
|
<li>The gnu utilities gmake and gzip/gunzip are needed and can be
|
|
obtained for OS/390 from <a href=
|
|
"http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc">
|
|
z/OS Unix - Tools and Toys</a>. Documentation on these tools can be found
|
|
at the <a href=
|
|
"http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg245944.html">
|
|
Open Source Software for OS/390 UNIX</a> Red Book.</li>
|
|
|
|
<li>Encoding considerations: The source code assumes that it is compiled
|
|
with codepage ibm-1047 (to be exact, the UNIX System Services variant of
|
|
it). The pax command converts all of the source code files from ASCII to
|
|
codepage ibm-1047 (USS) EBCDIC. However, some files are binary files and
|
|
must not be converted, or must be converted back to their original state.
|
|
You can use the <a href="as_is\os390\unpax-icu.sh">unpax-icu.sh</a>
|
|
script to do this for you automatically. It will unpackage the tar file
|
|
and convert all the necessary files for you automatically.
|
|
<!--The files that must not be converted to ibm-1047 are the
|
|
following:
|
|
|
|
<ul>
|
|
<li>All UTF-8 files</li>
|
|
|
|
<li>icu/data/*.brk</li>
|
|
|
|
<li>icu/source/test/testdata/uni-text.bin</li>
|
|
|
|
<li>icu/source/test/testdata/th18057.txt</li>
|
|
</ul>
|
|
Such a conversion can be done using iconv:<br>
|
|
<code>iconv -f IBM-1047 -t ISO8859-1 uni-text.bin >
|
|
uni-text.bin</code-->
|
|
</li>
|
|
|
|
<li>
|
|
<p>OS/390 supports both native S/390 hexadecimal floating point and,
|
|
(with Version 2.6 and later) IEEE binary floating point. This is a
|
|
compile time option. Applications built with IEEE should use ICU dlls
|
|
that are built with IEEE (and vice versa). The environment variable
|
|
IEEE390=1 will cause the OS/390 version of ICU to be built with IEEE
|
|
floating point. The default is native hexadecimal floating point.<br>
|
|
<em>Important:</em> Currently (ICU 1.4.2), native floating point
|
|
support is sufficient for codepage conversion, resource bundle and
|
|
UnicodeString operations, but the Format APIs, especially ChoiceFormat,
|
|
require IEEE binary floating point.</p>
|
|
|
|
<p>Examples for configuring ICU:<br>
|
|
Debug build: <code>IEEE390=1 ./runConfigureICU --enable-debug
|
|
zOS/cxx</code><br>
|
|
Release build: <code>IEEE390=1 ./runConfigureICU zOS/cxx</code></p>
|
|
</li>
|
|
|
|
<li>Since the default make on OS/390 is not gmake, the pkgdata tool
|
|
requires that the "make" command is aliased to your installed version of
|
|
gmake.</li>
|
|
|
|
<li>The makedep executable that is used with the OS/390 ICU build process
|
|
is not shipped with ICU. It is available at the <a href=
|
|
"http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc">
|
|
z/OS Unix - Tools and Toys</a> site. The PATH environment variable should
|
|
be updated to contain the location of this executable prior to build.
|
|
Alternatively, makedep may be moved into an existing PATH directory.</li>
|
|
|
|
<li>
|
|
To run all of the tests for ICU, use "gmake check". When running
|
|
individual tests of the test suite, the TZ environment variable should
|
|
be set to export TZ="PST8PDT" so that time zone comparisons are
|
|
correct. Building and testing ICU without using gmake requires that the
|
|
ICU libraries in the LIBPATH. In other words, the LIBPATH should
|
|
contain (each path prepended with the root directory that contains the
|
|
icu directory):
|
|
|
|
<ul>
|
|
<li>icu/source/common</li>
|
|
|
|
<li>icu/source/data</li>
|
|
|
|
<li>icu/source/i18n</li>
|
|
|
|
<li>icu/source/tools/ctestfw</li>
|
|
|
|
<li>icu/source/tools/toolutil</li>
|
|
|
|
<li>icu/source/extra/ustdio</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<h4>OS/390 Batch (PDS) support</h4>
|
|
|
|
<p>By default, ICU builds its libraries into the HFS. However, there is a
|
|
390-specific switch to build some libraries into PDS files. The switch is
|
|
the environmental variable OS390BATCH, and if set, the following libraries
|
|
are built into PDS files: libicuuc<i>XX</i>.dll, libicudt<i>XX</i>e.dll,
|
|
libicudt<i>XX</i>e_390.dll, and libtestdata.dll. Turning on OS390BATCH does
|
|
not turn off the normal HFS build, thus the HFS dlls will always be
|
|
created.</p>
|
|
|
|
<p>The names of the PDS files are determined by the value of the
|
|
environmental variables LOADMOD and LOADEXP. These variables must contain
|
|
the target PDS names whenever the OS390BATCH variable is set. LOADMOD is
|
|
the library (.dll) target dataset and LOADEXP is the side deck (.x) target
|
|
dataset.</p>
|
|
|
|
<p>The PDS member names are as follows:</p>
|
|
<pre>
|
|
<samp>IXMI<i>XX</i>UC --> libicuuc<i>XX</i>.dll
|
|
IXMI<i>XX</i>DA --> libicudt<i>XX</i>e.dll
|
|
IXMI<i>XX</i>D1 --> libicudt<i>XX</i>e_390.dll</samp>
|
|
</pre>
|
|
|
|
<p>Example PDS attributes are as follows:</p>
|
|
<pre>
|
|
<samp>Data Set Name . . . : <i>USER</i>.ICU.LOAD
|
|
General Data
|
|
Management class. . : **None**
|
|
Storage class . . . : BASE
|
|
Volume serial . . . : TSO007
|
|
Device type . . . . : 3390
|
|
Data class. . . . . : LOAD
|
|
Organization . . . : PO
|
|
Record format . . . : U
|
|
Record length . . . : 0
|
|
Block size . . . . : 32760
|
|
1st extent cylinders: 40
|
|
Secondary cylinders : 59
|
|
Data set name type : PDS
|
|
|
|
Data Set Name . . . : <i>USER</i>.ICU.EXP
|
|
General Data
|
|
Management class. . : **None**
|
|
Storage class . . . : BASE
|
|
Volume serial . . . : TSO007
|
|
Device type . . . . : 3390
|
|
Data class. . . . . : **None**
|
|
Organization . . . : PO
|
|
Record format . . . : FB
|
|
Record length . . . : 80
|
|
Block size . . . . : 3200
|
|
1st extent cylinders: 3
|
|
Secondary cylinders : 3
|
|
Data set name type : PDS</samp>
|
|
</pre>
|
|
|
|
<h3><a name="HowToBuildOS400" href="#HowToBuildOS400">OS/400 (iSeries)
|
|
Platform</a></h3>
|
|
|
|
<p>ICU Reference Release 1.8.1 contains partial support for the 400
|
|
platform, but additional work by the user is currently needed to get it to
|
|
build properly. A future release of ICU should work out-of-the-box under
|
|
OS/400.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
Requirements:
|
|
|
|
<ul>
|
|
<li>QSHELL interpreter installed (install base option 30, operating
|
|
system)</li>
|
|
<!--li>QShell Utilities, PRPQ 5799-XEH (not required for V4R5)</li-->
|
|
|
|
<li>ILE C/C++ Compiler for iSeries, LPP 5722-WDS</li>
|
|
|
|
<li>The latest GNU facilities (You can get the GNU facilities for
|
|
OS/400 from <a href=
|
|
"http://www.as400.ibm.com/developer/porting/gnu_utilities.html">http://www.as400.ibm.com/developer/porting/gnu_utilities.html</a>).
|
|
Older versions may not work properly.</li>
|
|
</ul>
|
|
<!-- end requirements -->
|
|
</li>
|
|
|
|
<li>
|
|
Build environment setup:
|
|
|
|
<ol>
|
|
<li>
|
|
Create AS400 target library. This library will be the target for
|
|
the resulting modules, programs and service programs. You will
|
|
specify this library on the OUTPUTDIR environment variable in step
|
|
2.<br>
|
|
|
|
<pre>
|
|
<samp>CRTLIB LIB(<i>libraryname</i>)</samp>
|
|
</pre>
|
|
<br>
|
|
</li>
|
|
|
|
<li>
|
|
Set up the following environment variables in your build process
|
|
(use the <i>libraryname</i> from the previous step)
|
|
<pre>
|
|
<samp>ADDENVVAR ENVVAR(ICU_DATA) VALUE('/icu/source/data')
|
|
ADDENVVAR ENVVAR(CC) VALUE('/usr/bin/icc')
|
|
ADDENVVAR ENVVAR(CXX) VALUE('/usr/bin/icc')
|
|
ADDENVVAR ENVVAR(MAKE) VALUE('/usr/bin/gmake')
|
|
ADDENVVAR ENVVAR(OUTPUTDIR) VALUE('<i>libraryname</i>')</samp>
|
|
</pre>
|
|
<i>libraryname</i> identifies target as400 library for *module,
|
|
*pgm and *srvpgm objects.<br>
|
|
<br>
|
|
</li>
|
|
<!--li>Add QCXXN, to your build process library list. This results in
|
|
the resolution of CRTCPPMOD used by the icc compiler</li-->
|
|
|
|
<li>
|
|
In order to get the tests to run correctly, the QUTCOFFSET needs to
|
|
be set to the Pacific Time Zone offset.<br>
|
|
<br>
|
|
To check your QUTCOFFSET:
|
|
<pre>
|
|
<samp>DSPSYSVAL SYSVAL(QUTCOFFSET)</samp>
|
|
</pre>
|
|
<br>
|
|
To change your QUTCOFFSET:<br>
|
|
<pre>
|
|
<samp>CHGSYSVAL SYSVAL(QUTCOFFSET) VALUE('-0800')</samp>
|
|
</pre>
|
|
You should change -0800 to -0700 for daylight savings.<br>
|
|
<br>
|
|
</li>
|
|
|
|
<li>Run 'CHGJOB CCSID(37)'</li>
|
|
|
|
<li>Run 'QSH'</li>
|
|
|
|
<li>Run gunzip on the ICU source code compressed tar archive
|
|
(icu-<i>X</i>-<i>Y</i>.tar.gz or icu-<i>X</i>-<i>Y</i>.tgz).</li>
|
|
|
|
<li>Run unpax-icu.sh on the tar file from the ICU download page.</li>
|
|
|
|
<li>Change your current directory to icu/source.</li>
|
|
|
|
<li>Run 'as_is/os400/configure --host=as400-os400'</li>
|
|
|
|
<li>Run 'gmake -e'. The '-e' option is needed to pickup the
|
|
compilers.</li>
|
|
|
|
<li>Run 'gmake -e check' to run the tests.</li>
|
|
</ol>
|
|
<!-- end build environment -->
|
|
</li>
|
|
</ul>
|
|
|
|
<h2><a name="ImportantNotes" href="#ImportantNotes">Important Notes About
|
|
Using ICU</a></h2>
|
|
|
|
<h3><a name="ImportantNotesWindows" href="#ImportantNotesWindows">Windows
|
|
Platform</a></h3>
|
|
|
|
<p>If you are building on the Win32 platform, it is important that you
|
|
understand a few of the following build details.</p>
|
|
|
|
<h4>DLL directories and the PATH setting</h4>
|
|
|
|
<p>As delivered, the International Components for Unicode build as several
|
|
DLLs which are placed in the "<i><ICU></i>\bin" directory. You must
|
|
add this directory to the PATH environment variable in your system, or any
|
|
executables you build will not be able to access International Components
|
|
for Unicode libraries. Alternatively, you can copy the DLL files into a
|
|
directory already in your PATH, but we do not recommend this. You can wind
|
|
up with multiple copies of the DLL and wind up using the wrong one.</p>
|
|
|
|
<h4><a name="ImportantNotesWindowsPath">Changing your PATH</a></h4>
|
|
|
|
<ul>
|
|
<li><strong>Windows 2000</strong>: Use the System Icon in the Control
|
|
Panel. Pick the "Advanced" tab. Select the "Environment Variables..."
|
|
button. Select the variable PATH in the lower box, and select the lower
|
|
"Edit..." button. In the "Variable Value" box, append the string
|
|
";<i><ICU></i>\bin" to the end of the path string. If there is
|
|
nothing there, just type in "<i><ICU></i>\bin". Click the Set
|
|
button, then the OK button.</li>
|
|
|
|
<li><strong>Windows NT</strong>: Use the System Icon in the Control
|
|
Panel. Pick the "Environment" tab, and select the variable PATH in the
|
|
lower box. In the "value" box, append the string
|
|
";<i><ICU></i>\bin" at the end of the path string. If there is
|
|
nothing there, just type in "<i><ICU></i>\bin". Click the Set
|
|
button, then the OK button.</li>
|
|
|
|
<li><strong>Windows 95/98/ME</strong>: Edit the autoexec.bat, and add the
|
|
following line to the end of file, "SET
|
|
PATH=%PATH%;<i><ICU></i>\bin"</li>
|
|
</ul>
|
|
|
|
<p>Note: when packaging a Windows application for distribution and
|
|
installation on user systems, copies of the ICU dlls should be included
|
|
with the application, and installed for exclusive use by the application.
|
|
This is the only way to insure that your app is running with the same
|
|
version of ICU, built with exactly the same options, that you developed and
|
|
tested with. Refer to Microsoft's guidelines on the usage of dlls, or
|
|
search for the phrase "dll hell" on <a href=
|
|
"http://msdn.microsoft.com/">msdn.microsoft.com</a>.</p>
|
|
|
|
<h4>Linking with Runtime libraries</h4>
|
|
|
|
<p>All the DLLs link with the C runtime library "Debug Multithreaded DLL"
|
|
or "Multithreaded DLL." (This is changed through the Project Settings
|
|
dialog, on the C/C++ tab, under Code Generation.) It is important that any
|
|
executable or other DLL you build which uses the International Components
|
|
for Unicode DLLs links with these runtime libraries as well. If you do not
|
|
do this, you will get random memory errors when you run the executable.<br>
|
|
</p>
|
|
|
|
<h3><a name="ImportantNotesUnix" href="#ImportantNotesUnix">Unix Type
|
|
Platform</a></h3>
|
|
|
|
<p>If you are building on a Unix platform, it is important that you add the
|
|
location of your ICU libraries (including the data library) to your
|
|
LD_LIBRARY_PATH environment variable. The ICU libraries may not link or
|
|
load properly without doing this.</p>
|
|
|
|
<h3><a name="ImportantNotesDefaultCP" href="#ImportantNotesDefaultCP">Using
|
|
the default codepage</a></h3>
|
|
|
|
<p>ICU has code to determine the default codepage of the system or process.
|
|
This default codepage can be used to convert <code>char *</code> strings to
|
|
and from Unicode.</p>
|
|
|
|
<p>Depending on system design, setup and APIs, it may not always be
|
|
possible to find a default codepage that fully works as expected. For
|
|
example,</p>
|
|
|
|
<ul>
|
|
<li>On Windows there are three encodings in use at the same time. Unicode
|
|
(UTF-16) is always used inside of Windows, while for <code>char *</code>
|
|
encodings there are two classes, called "ANSI" and "OEM" codepages. ICU
|
|
will use the ANSI codepage. Note that the OEM codepage is used by default
|
|
for console window output.</li>
|
|
|
|
<li>On some Unix-type systems, non-standard names are used for encodings,
|
|
or non-standard encodings are used altogether. Although ICU supports 200
|
|
encodings in its standard build and many more aliases for them, it will
|
|
not be able to recognize such non-standard names.</li>
|
|
|
|
<li>Some systems do not have a notion of a system or process codepage,
|
|
and may not have APIs for that.</li>
|
|
</ul>
|
|
|
|
<p>If you have means of detecting a default codepage name that are more
|
|
appropriate for your application, then you should set that name with
|
|
<code>ucnv_setDefaultName()</code> as the first ICU function call. This
|
|
makes sure that the internally cached default converter will be
|
|
instantiated from your preferred name.</p>
|
|
|
|
<p>Starting in ICU 2.0, when a converter for the default codepage cannot be
|
|
opened, a fallback default codepage name and converter will be used. On
|
|
most platforms, this will be US-ASCII. For OS/390 (z/OS), ibm-1047-s390 is
|
|
the default fallback codepage. For AS/400 (iSeries), ibm-37 is the default
|
|
fallback codepage. This default fallback codepage is used when the
|
|
operating system is using a non-standard name for a default codepage, or
|
|
the converter was not packaged with ICU. The feature allows ICU to run in
|
|
unusual computing environments without completely failing.</p>
|
|
|
|
<h3><a name="ImportantNotesDeprecatedAPI" href=
|
|
"#ImportantNotesDeprecatedAPI">Methods for enabling deprecated
|
|
APIs</a></h3>
|
|
|
|
<h4>C</h4>
|
|
|
|
<p>Some deprecated C APIs can be enabled without recompiling the ICU
|
|
libraries. This can be achieved by defining certain symbols before
|
|
including the ICU header files. For example, to enable deprecated C APIs
|
|
for formatting.</p>
|
|
<pre>
|
|
<samp>#ifndef U_USE_DEPRECATED_FORMAT_API
|
|
# define U_USE_DEPRECATED_FORMAT_API 1
|
|
#endif
|
|
|
|
#include "unicode/udat.h"
|
|
|
|
int main(){
|
|
UDateFormat *def, *fr, *fr_pat ;
|
|
UErrorCode status = U_ZERO_ERROR;
|
|
UChar temp[30];
|
|
|
|
fr = udat_open(UDAT_FULL, UDAT_DEFAULT, "fr_FR", NULL,0, &status);
|
|
if(U_FAILURE(status)){
|
|
printf("Error creating the french dateformat using full time style\n %s\n",
|
|
myErrorName(status) );
|
|
}
|
|
/* This is supposed to open default date format,
|
|
but later on it treats it like it is "en_US".
|
|
This is very bad when you try to run the tests
|
|
on a machine where the default locale is NOT "en_US"
|
|
*/
|
|
def = udat_open(UDAT_SHORT, UDAT_SHORT, "en_US", NULL, 0, &status);
|
|
if(U_FAILURE(status)){
|
|
.... /* handle the error */
|
|
}
|
|
}</samp>
|
|
</pre>
|
|
|
|
<h4>C++</h4>
|
|
|
|
<p>Deprecated C++ APIs cannot be enabled without recompiling ICU libraries.
|
|
Every service has a specific symbol that should be defined to enable the
|
|
deprecated API of that service. For example: To enable deprecated APIs in
|
|
Transliteration service, the U_USE_DEPRECATED_TRANSLITERATOR_API symbol
|
|
should be defined before compiling ICU.</p>
|
|
|
|
<h2><a name="PlatformDependencies" href="#PlatformDependencies">Platform
|
|
Dependencies</a></h2>
|
|
|
|
<p>The platform dependencies have been mostly isolated into the following
|
|
files in the common library. This information can be useful if you are
|
|
porting ICU to a new platform.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<strong>unicode/platform.h.in</strong> (autoconf'ed platforms)<br>
|
|
<strong>unicode/p<i>XXXX</i>.h</strong> (others: pwin32.h, pmacos.h,
|
|
..): Platform-dependent typedefs and defines:<br>
|
|
<br>
|
|
|
|
|
|
<ul>
|
|
<li>XP_CPLUSPLUS for C++ only.</li>
|
|
|
|
<li>TRUE and FALSE, UBool, int8_t, int16_t etc.</li>
|
|
|
|
<li>U_EXPORT and U_IMPORT for specifying dynamic library import and
|
|
export</li>
|
|
</ul>
|
|
<br>
|
|
</li>
|
|
|
|
<li>
|
|
<strong>unicode/putil.h, putil.c</strong>: platform-dependent
|
|
implementations of various functions that are platform dependent:<br>
|
|
<br>
|
|
|
|
|
|
<ul>
|
|
<li>uprv_isNaN, uprv_isInfinite, uprv_getNaN and uprv_getInfinity for
|
|
handling special floating point values.</li>
|
|
|
|
<li>uprv_tzset, uprv_timezone, uprv_tzname and time for getting
|
|
platform specific time and timezone information.</li>
|
|
|
|
<li>u_getDataDirectory for getting the default data directory.</li>
|
|
|
|
<li>uprv_getDefaultLocaleID for getting the default locale
|
|
setting.</li>
|
|
|
|
<li>uprv_getDefaultCodepage for getting the default codepage
|
|
encoding.</li>
|
|
</ul>
|
|
<br>
|
|
</li>
|
|
|
|
<li>
|
|
<strong>umutex.h, umutex.c</strong>: Code for doing synchronization in
|
|
multithreaded applications. If you wish to use International Components
|
|
for Unicode in a multithreaded application, you must provide a
|
|
synchronization primitive that the classes can use to protect their
|
|
global data against simultaneous modifications. See Users' guide for
|
|
more information.<br>
|
|
<br>
|
|
|
|
|
|
<ul>
|
|
<li>We supply sample implementations for WinNT, Win95, Win98,
|
|
Sun/Solaris, RedHat/Linux, HP-UX and for AIX on an RS/6000.</li>
|
|
</ul>
|
|
<br>
|
|
</li>
|
|
|
|
<li><strong>umapfile.h, umapfile.c</strong>: functions for mapping or
|
|
otherwise reading or loading files into memory. All access by ICU to data
|
|
from files makes use of these functions.<br>
|
|
<br>
|
|
</li>
|
|
|
|
<li>For the Intltest test suite, intltest.cpp in
|
|
"icu/source/test/intltest/" contains the method pathnameInContext, which
|
|
must also be adapted to any new platform.</li>
|
|
|
|
<li>Using platform specific #ifdef macros are highly discouraged outside
|
|
of the scope of these files. When the source code gets updated in the
|
|
future, these #ifdef's can cause testing problems for your platform.</li>
|
|
</ul>
|
|
|
|
<p>It is possible to build each library individually. They must be built in
|
|
the following order:<br>
|
|
</p>
|
|
|
|
<ol>
|
|
<li>stubdata</li>
|
|
|
|
<li>common</li>
|
|
|
|
<li>i18n</li>
|
|
|
|
<li>toolutil</li>
|
|
|
|
<li>makeconv</li>
|
|
|
|
<li>genrb</li>
|
|
|
|
<li>gentz</li>
|
|
|
|
<li>genccode</li>
|
|
|
|
<li>gennames</li>
|
|
|
|
<li>genuca</li>
|
|
|
|
<li>gennorm</li>
|
|
|
|
<li>makedata (a project on Windows, or source/data/Makefile on Unix)</li>
|
|
|
|
<li>ctestfw, intltest and cintltst, if you want to run the test
|
|
suite.</li>
|
|
</ol>
|
|
<hr>
|
|
|
|
<p>Copyright © 1997-2002 International Business Machines Corporation
|
|
and others. All Rights Reserved.<br>
|
|
IBM Globalization Center of Competency - San Jose,<br>
|
|
5600 Cottle Road, San José, CA 95193<br>
|
|
All rights reserved.</p>
|
|
</body>
|
|
</html>
|
|
|