9ce073fc3b
X-SVN-Rev: 660
863 lines
45 KiB
HTML
863 lines
45 KiB
HTML
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
|
<html>
|
|
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="Template" content="F:\Program Files\Microsoft Office\Office\html.dot">
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
|
|
<title>ReadMe for ICU</title>
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
|
|
|
<h2>ReadMe: IBM Classes For Unicode</h2>
|
|
|
|
<p>Version: 12/28/2000<br>
|
|
</p>
|
|
|
|
<hr>
|
|
|
|
<p>COPYRIGHT: <br>
|
|
Copyright © 1997-2000 International Business Machines Corporation an others. All Rights
|
|
Reserved.</p>
|
|
|
|
<hr>
|
|
|
|
<p><br>
|
|
</p>
|
|
|
|
<h3><u>Contents</u></h3>
|
|
|
|
<ul>
|
|
<li><a href="#introduction">Introduction</a></li>
|
|
<li><a href="#WhatContain">What the IBM Classes for Unicode Contain</a></li>
|
|
<li><a href="#API">API overview</a></li>
|
|
<li><a href="#PlatformDependencies">Platform Dependencies</a></li>
|
|
<li><a href="#ImportantNotes">Important Installation Notes</a></li>
|
|
<li><a href="#HowToInstall">How to Install/Build</a></li>
|
|
<li><a href="#datahandling">How ICU handles data</a></li>
|
|
<li><a href="#CharsetConvert">Character Set Conversion Information</a></li>
|
|
<li><a href="#ProgrammingNotes">Programming Notes</a></li>
|
|
<li><a href="#WhereToFindMore">Where to Find More Information</a></li>
|
|
<li><a href="#SubmittingComments">Submitting Comments, Requesting Features and Reporting
|
|
Bugs</a></li>
|
|
</ul>
|
|
|
|
<h3><a NAME="introduction"></a><u>Introduction</u></h3>
|
|
|
|
<p>Today's software market is a global one in which it is desirable to develop and
|
|
maintain one application that supports a wide variety of national languages. IBM Classes
|
|
for Unicode provides the following tools to help you write language independent
|
|
applications:
|
|
|
|
<ul>
|
|
<li>UnicodeString supporting the Unicode 3.0 standard</li>
|
|
<li>Resource bundles for storing and accessing localized information</li>
|
|
<li>Number formatters for converting binary numbers into text strings for meaningful display</li>
|
|
<li>Date and time formatters for converting internal time data into text strings for
|
|
meaningful display</li>
|
|
<li>Message formatters for putting together sequences of strings, numbers dates and other
|
|
format to create messages</li>
|
|
<li>Text collation supporting language sensitive comparison of strings</li>
|
|
<li>Text boundary analysis for finding characters, word and sentence boundaries</li>
|
|
<li>Changing simple data files rather than modifying program code easily localizes
|
|
applications written using these tools</li>
|
|
<li>Over 150 locales supported. Visit <a
|
|
href="http://oss.software.ibm.com/developerworks/opensource/icu/localeexplorer">LocaleExplore
|
|
(http://oss.software.ibm.com/developerworks/opensource/icu/localeexplorer)</a> site for a
|
|
demonstration and a full list of supported locales or <a href="docs/supp_loc.html">click
|
|
here for a table of supported locales</a>.</li>
|
|
</ul>
|
|
|
|
<p>It is possible to support additional locales by adding more locale data files, with no
|
|
code changes. </p>
|
|
|
|
<p>Please refer to POSIX programmer's Guide for details on what the ISO locale ID means. </p>
|
|
|
|
<p>Your comments are important to making this release successful. We are committed
|
|
to fixing any bugs, and will also use your feedback to help plan future releases. </p>
|
|
|
|
<blockquote>
|
|
<b><u><p>IMPORTANT</u>: Please make sure you understand the <a href="license.html">Copyright
|
|
and License information</a>.</b></p>
|
|
</blockquote>
|
|
|
|
<blockquote>
|
|
<p> </p>
|
|
</blockquote>
|
|
|
|
<h3><a NAME="WhatContain"></a><u>What the IBM Classes For Unicode Contain</u></h3>
|
|
|
|
<p>There are two ways to download the ICU releases,
|
|
|
|
<ul>
|
|
<li><strong>Official Release Snapshot:<br>
|
|
</strong>If you want to use ICU (as opposed to developing it), your best bet is to
|
|
download an official, packaged ICU version of the ICU source code. These versions
|
|
are tested more thoroughly than day-to-day development builds of the system, and they are
|
|
packaged in zip and tar files for convenient download. These packaged files can be
|
|
found at <a
|
|
href="http://oss.software.ibm.com/developerworks/opensource/icu/project/download/index.html">http://oss.software.ibm.com/developerworks/opensource/icu/project/download/index.html</a>.<br>
|
|
If packaged snapshot is named <b>ICUXXXXXX.zip </b>, XXXXXX is the release version number.<br>
|
|
Please unzip this file. It will re-construct the source directory. </li>
|
|
<li><strong>CVS Source Repository:<br>
|
|
</strong>If you are interested in developing features, patches, or bug fixes for ICU, you
|
|
should probably be working with the latest version of the ICU source code. You will need
|
|
to check the code out of our CVS repository to ensure that you have the most recent
|
|
version of all of the files. There are several ways to do this:<ul>
|
|
<li>WebCVS:<br>
|
|
If you want to browse the code and only make occasional downloads, you may want to use
|
|
WebCVS. It provides a convenient, web-based interface for browsing and downloading the
|
|
latest version of the ICU source code and documentation. You can also view each file's
|
|
revision history, display the differences between individual revisions, determine which
|
|
revisions were part of which official release, and so on. </li>
|
|
<li>WinCVS:<br>
|
|
If you will be doing serious work on ICU, you should probably install a CVS client on your
|
|
own machine so that you can do batch operations without going through the WebCVS
|
|
interface. On Windows, we suggest the WinCVS client. The following is the example
|
|
instruction on how to download ICU via WinCVS: <br>
|
|
1.Install the WinCVS client, which you can download from the WinCVS home page. <br>
|
|
2.In the WinCVS preferences, specify your CVSRoot to be
|
|
":pserver:anoncvs@oss.software.ibm.com:/usr/cvs/icu"<br>
|
|
with the password "anoncvs". To enter the CVSRoot value, select
|
|
"Preferences" from the "Cvs Admin" pull-down menu.
|
|
Authentication should be set to "'passwd' file on the cvs server". <br>
|
|
3.To "extract" the most recent version of ICU, select "Checkout
|
|
module" from the "Cvs Admin" menu. Specify "icu" for the module
|
|
name. </li>
|
|
<li>CVS command line:<br>
|
|
You can also check out the repository anonymously on UNIX using the following commands,
|
|
after first setting your CVSROOT to point to the ICU repository: <br>
|
|
<br>
|
|
export CVSROOT=:pserver:anoncvs@oss.software.ibm.com:/usr/cvs/icu<br>
|
|
cvs login CVS password: anoncvs<br>
|
|
cvs checkout icu<br>
|
|
cvs logout</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>For more details on how to download ICU directly from the web site, please also see <a
|
|
href="http://oss.software.ibm.com/developerworks/opensource/icu/project/download/index.html">http:/oss.software.ibm.com/developerworks/opensource/icu/project/download/index.html</a></p>
|
|
|
|
<p>Below, <b>$Root</b> is the placement of the icu directory in your file system, like
|
|
"drive:\...\icu" in your environment. "drive:\..." stands for any
|
|
drive and any directory on that drive that you chose to install icu into.</p>
|
|
|
|
<p><b>The following files describe the code drop:</b> <br>
|
|
<br>
|
|
</p>
|
|
|
|
<table BORDER="1">
|
|
<tr>
|
|
<td>readme.html (this file)</td>
|
|
<td>describes the IBM Classes for Unicode</td>
|
|
</tr>
|
|
<tr>
|
|
<td>license.html</td>
|
|
<td>contains IBM's public license</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><b>The following directories contain source code and data files:</b> <br>
|
|
<br>
|
|
</p>
|
|
|
|
<table BORDER="1" WIDTH="623">
|
|
<tr>
|
|
<td WIDTH="20%">$Root\source\common\</td>
|
|
<td WIDTH="80%">The utility classes, such as ResourceBundle, Unicode, Locale,
|
|
UnicodeString. The codepage conversion library API, UnicodeConverter.</td>
|
|
</tr>
|
|
<tr>
|
|
<td WIDTH="20%">$Root\source\i18n\</td>
|
|
<td WIDTH="80%">The collation source files, Collator, RuleBasedCollator and
|
|
CollationKey. <br>
|
|
The text boundary API, which locates character, word, sentence, and <br>
|
|
line breaks. <br>
|
|
The format API, which formats and parses data in numeric or date format to and from text.</td>
|
|
</tr>
|
|
<tr>
|
|
<td WIDTH="20%">$Root\source\test\intltest\</td>
|
|
<td WIDTH="80%">A test suite including all C++ APIs. For information about running the
|
|
test suite, see <a href="docs/intltest.html">docs\intltest.html</a>.</td>
|
|
</tr>
|
|
<tr>
|
|
<td WIDTH="20%">$Root\source\test\cintltst\</td>
|
|
<td WIDTH="80%">A test suite including all C APIs. For information about running the test
|
|
suite, see <a href="docs/cintltst.html">docs\cintltst.html.</a></td>
|
|
</tr>
|
|
<tr>
|
|
<td WIDTH="20%">$Root\data\</td>
|
|
<td WIDTH="80%">The Unicode 3.0 data file. Please see <a
|
|
href="http://www.unicode.org/">http://www.unicode.org/</a> for more information. <br>
|
|
This directory also contains the resource files for all international objects. These
|
|
files are of three types: <ul>
|
|
<li>TXT files contain general locale data. </li>
|
|
<li>RES files contain non-portable locale data files which are generated by the <strong>genrb</strong>
|
|
tool.</li>
|
|
<li>COL files are non-portable packed binary collation data files which are created by the <strong>gencol</strong>
|
|
tool. </li>
|
|
<li>UCM files which contain mapping tables {from,to} Unicode in text format</li>
|
|
<li>CNV files are non-portable packed binary conversion data generated by the <strong>makeconv</strong>
|
|
tool.</li>
|
|
<li>icudata.dll file contains data files in a dynamic loadable library format. At this
|
|
moment, this file contains CNV files, converter aliases, timezone data and Unicode
|
|
character names. Please read <a href="docs/udata.html">udata.html</a> for more
|
|
information.</li>
|
|
<li>icudata.dat file contains data files in a memory mapped file format. At this moment,
|
|
this file contains CNV files, converter aliases, timezone data and Unicode character
|
|
names. Please read <a href="docs/udata.html">udata.html</a> for more information.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td WIDTH="20%">$Root\source\tools</td>
|
|
<td WIDTH="80%">Tools for generating the data files. Data files are generated by invoking
|
|
$Root\source\tools\makedata.bat on Win32 or $Root\source\make install on Unix.</td>
|
|
</tr>
|
|
<tr>
|
|
<td WIDTH="20%">$Root\source\samples</td>
|
|
<td WIDTH="80%">Various sample programs that use ICU</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p> <b>The following directories are populated when you've built the framework:</b> <br>
|
|
(on Unix, replace $Root with the value given to the file "configure") <br>
|
|
</p>
|
|
|
|
<table BORDER="1">
|
|
<tr>
|
|
<td>$Root\include\</td>
|
|
<td>contains all the public header files.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>$output</td>
|
|
<td>contains the libraries for static/dynamic linking or executable programs.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><b>The following diagram shows the main directory structure of the IBM Classes for
|
|
Unicode:</b> </p>
|
|
|
|
<pre> icu-NNNN
|
|
|
|
|
output icu
|
|
_____|_____ ______________|______________________________
|
|
| | | | | | |
|
|
libraries programs include data source | |
|
|
(built) (built) (built) | readme.html license.html
|
|
|
|
|
_________________|__________________________
|
|
| | | | | |
|
|
common i18n test extra tools samples
|
|
| |
|
|
___|___ ___|_________________
|
|
| | | | | |
|
|
intltest cintltst makeconv ctestfw genrb ....</pre>
|
|
|
|
<h3><a NAME="API"></a><u>API Overview</u></h3>
|
|
|
|
<p>In the IBM Classes for Unicode, there are two categories:
|
|
|
|
<ul>
|
|
<li>Low-level Unicode/Resource Attributes: (<strong>icuuc</strong> library)<ul>
|
|
<li><a href="docs/utilCL.html">Utility Classes</a></li>
|
|
<li><a href="docs/conversion_interface.htm">Conversion Interface</a></li>
|
|
</ul>
|
|
</li>
|
|
<li>High-level Unicode Internationalization: (<strong>icui18n</strong> library)<ul>
|
|
<li><a href="docs/boundCL.html">Text Boundary Classes</a></li>
|
|
<li><a href="docs/collateCL.html">Collation Classes</a></li>
|
|
<li><a href="docs/formatCL.html">Formatting Classes</a></li>
|
|
<li>Transliterator Classes</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>See IBM<a href="docs/codeConv.html"> Classes for Unicode Code Conventions</a> for a
|
|
discussion of code conventions common to all library classes. </p>
|
|
|
|
<p>See also <a href="html/aindex.html">html/aindex.html</a> for an alphabetical index, and
|
|
<a href="html/HIERjava.html">html/HIERjava.html</a> for a hierarchical index to detailed
|
|
API documentation. <br>
|
|
<br>
|
|
</p>
|
|
|
|
<h3><a NAME="PlatformDependencies"></a><u>Platform Dependencies</u></h3>
|
|
|
|
<p>The platform dependencies have been isolated into the following 4 files:
|
|
|
|
<ul>
|
|
<li><u>platform.h.in:</u> Platform-dependent typedefs and defines:</li>
|
|
</ul>
|
|
|
|
<blockquote>
|
|
<ul>
|
|
<li>XP_CPLUSPLUS is defined for C++</li>
|
|
<li>bool_t, TRUE and FALSE, int8_t, int16_t etc.</li>
|
|
<li>U_EXPORT and U_IMPORT for specifying dynamic library import and export</li>
|
|
</ul>
|
|
</blockquote>
|
|
|
|
<ul>
|
|
<li><u>putil.c:</u> platform-dependent implementations of various functions that are
|
|
platform dependent: (declared in putil.h)</li>
|
|
</ul>
|
|
|
|
<blockquote>
|
|
<ul>
|
|
<li>icu_isNaN, icu_isInfinite(double), icu_getNaN(); icu_getInfinity for handling special
|
|
floating point values</li>
|
|
<li>icu_tzset, icu_timezone, icu_tzname and time for reading platform specific time and
|
|
timezone information</li>
|
|
<li>icu_getDefaultDataDirectory, icu_getDefaultLocaleID for reading the locale setting and
|
|
data directory</li>
|
|
<li>icu_isBigEndian for finding the endianess of the platform</li>
|
|
<li>icu_nextDouble is used specifically by the ChoiceFormat API.</li>
|
|
</ul>
|
|
</blockquote>
|
|
|
|
<ul>
|
|
<li><u>mutex.h and mutex.cpp</u>: Code for doing synchronization in multithreaded
|
|
applications. If you wish to use IBM Classes for Unicode in a multithreaded application,
|
|
you must provide a synchronization primitive that the classes can use to protect their
|
|
global data against simultaneous modifications. See <a href="docs/mutex.html">docs\mutex.html</a>
|
|
for more information.</li>
|
|
<ul>
|
|
<li>We supply sample implementations for WinNT, Win95, Win98, Sun/Solaris, RedHat/Linux,
|
|
HP-UX and for AIX on an RS/6000.</li>
|
|
<li>If you are changing the platform-dependent files, ptypes.h and putil.h may also be
|
|
interesting, but shouldn't have to be changed. If you think any other files than the ones
|
|
mentioned above have platform dependencies, please contact us.</li>
|
|
<li>For the Intltest test suite, intltest.cpp in "icu\source\test\intltest\"
|
|
contains the method pathnameInContext, which must also be adapted to any new platform.</li>
|
|
</ul>
|
|
</ul>
|
|
|
|
<h3><a NAME="ImportantNotes"></a><b><u>Important Installation Notes </u></b></h3>
|
|
|
|
<p><strong>Win32 Platform</strong></p>
|
|
|
|
<p>If you are building on the Win32 platform, it is important that you understand a few
|
|
build details: </p>
|
|
|
|
<p><u>DLL directories and the PATH setting:</u> As delivered, the IBM Classes for Unicode
|
|
build as several DLLs. These DLLs are placed in the directories "icu\bin\Debug"
|
|
and "icu\bin\Release". You must add either of these directories to the
|
|
PATH environment variable in your system, or any executables you build will not be able to
|
|
access IBM Classes for Unicode libraries. Alternatively, you can copy the DLL files into a
|
|
directory already in your PATH, but we do not recommend this -- you can wind up with
|
|
multiple copies of the DLL, and wind up using the wrong one. </p>
|
|
|
|
<p><u>To change your PATH:</u> Do this under NT by using the System control panel.
|
|
Pick the "Environment" tab, select the variable PATH in the lower box. In
|
|
the "value" box, append the string ";drive:\...\icu\bin\Debug" at the
|
|
end of the path string. If there is nothing there, just type in
|
|
"drive:\...\icu\bin\Debug". Click the Set button, then the Ok button. </p>
|
|
|
|
<p><u>Link with Runtime libraries:</u> All the DLLs link with the C runtime library
|
|
"Debug Multithreaded DLL" or "Multithreaded DLL." (This is changed
|
|
through the Project Settings dialog, on the C/C++ tab, under Code Generation.) It is
|
|
important that any executable or other DLL you build which uses the IBM Classes for
|
|
Unicode DLLs links with these runtime libraries as well. If you do not do this, you will
|
|
seemingly get memory errors when you run the executable. <br>
|
|
</p>
|
|
|
|
<p><strong>OS/390 Platform</strong></p>
|
|
|
|
<p>If you are building on the OS/390 Unix System Servives Platform, it is important that
|
|
you understand a<br>
|
|
few details. <br>
|
|
<br>
|
|
The gnu utilities gmake and gzip/gunzip are needed and can be obtained for OS/390 from<br>
|
|
www.mks.com. Search for os/390, register, and follow download directions. <br>
|
|
<br>
|
|
DLL directories and the LIBPATH setting: The ICU dlls libicu-i18n and libicu-uc.dll should
|
|
be added<br>
|
|
to the LIBPATH environment variable concatenation.<br>
|
|
<br>
|
|
OS/390 supports both native hexadecimal floating point and, with Version 2.6 and later,
|
|
IEEE binary<br>
|
|
floating point. This is a compile time option. Applications built with IEEE should use ICU
|
|
dlls which are<br>
|
|
built with IEEE (and vice versa). The environment variable IEEE390=1 will cause the OS/390
|
|
version<br>
|
|
of ICU to be built with IEEE floating point. The default is native hexadecimal floating
|
|
point. <br>
|
|
<br>
|
|
The makedep executable is shipped with ICU for use with the OS/390 ICU build process. The
|
|
PATH<br>
|
|
environment variable should be updated to contain the location of this executable prior to
|
|
build.<br>
|
|
Alternatively, makedep may be moved into an existing PATH directory.<br>
|
|
<br>
|
|
When running the test suite, the TZ environment variable should be set to export
|
|
TZ="PST8PDT" so<br>
|
|
that timezone comparisons are correct.</p>
|
|
|
|
<h3><a NAME="HowToInstall"></a><u>How to Install/Build on Win NT</u></h3>
|
|
|
|
<p>Building IBM Classes for Unicode requires:
|
|
|
|
<ul>
|
|
<li>Microsoft NT 3.51 or above</li>
|
|
<li>Microsoft Visual C++ 6.0 (Service Pack 2 is required to work with the release build of
|
|
max speed optimization).</li>
|
|
</ul>
|
|
|
|
<p>The steps are:
|
|
|
|
<ol>
|
|
<li>Unzip the icu-XXXX.zip file, type "unzip -a icu-XXXX.zip -d drive:\directory"
|
|
under command prompt or use WinZip. drive:\directory\icu is the root ($Root)
|
|
directory (you may but don't need to place "icu" into another directory). If you
|
|
change the root, you will change the project settings accordingly in EACH makefile in the
|
|
project, updating the include and library paths.</li>
|
|
<li>Set the environment variable <strong>ICU_DATA</strong>, the full pathname of the data
|
|
directory, to indicate where the locale data files and conversion mapping tables are.</li>
|
|
<li>Start Microsoft Visual C++ 6.0.</li>
|
|
<li>Choose "File" menu and select "Open WorkSpace".</li>
|
|
<li>In the file chooser, choose icu\source\allinone\allinone.dsw. Open this workspace.</li>
|
|
<li>This workspace includes all the IBM Classes for Unicode libraries, necessary tools as
|
|
well as intltest and cintltest test suite projects.</li>
|
|
<li>Set the active Project. Choose "Project" menu and select "Set active
|
|
project". In the submenu, select "all" workspace.</li>
|
|
<li>Set the active configuration ("Win32 Debug" or "Win32 Release") and
|
|
make sure this matches your PATH setting as described in the previous chapter. (See note
|
|
below.)</li>
|
|
<li>Choose "Build" menu and select "Rebuild All". If you want to build
|
|
the Debug and Release configurations at the same time, choose "Build" menu and
|
|
select "Batch Build..." instead (and mark all configurations as checked), then
|
|
click the button named "Rebuild All".</li>
|
|
<li>The "all" workspace will build all the test programs as well as the tools for
|
|
generating binary locale data files. The "makedata" project will be run
|
|
automatically to convert the locale data files from text format into icudata.dll.</li>
|
|
<li>Save the value of the <strong>TZ</strong> environment variable and then set it to <strong>PST8PDT</strong>.
|
|
</li>
|
|
<li>Reopen the "allinone" project file and run the "intltest" test.
|
|
Reset the <strong>TZ</strong> value.</li>
|
|
<li>To run the C test suite, set "cintltst" as the active project, repeat steps 11
|
|
and then run the "cintltst" test..</li>
|
|
<li>Build and run as outlined above.</li>
|
|
</ol>
|
|
<b>
|
|
|
|
<p>Note: </b>To set the active configuration, two different possibilities are:
|
|
|
|
<ul>
|
|
<li>Choose "Build" menu, select "Set Active Configuration", and select
|
|
"Win32 Release" or "Win32 Debug".</li>
|
|
<li>Another way is to select "Customize" in the "Tools" menu, select the
|
|
"Toolbars" tab, enable "Build" instead of "Build Minibar",
|
|
and click on "Close". This will bring up a toolbar which you can move aside the
|
|
other permanent toolbars at the top of the MSVC window. The advantage is that you now have
|
|
an easy-to-reach pop-up menu which will always show the currently selected active
|
|
configuration. Or, you can drag the project and configuration selectiors and drop
|
|
them on the menu bar for later selection.</li>
|
|
</ul>
|
|
|
|
<p>It is also possible to build each library individually, using the workspaces in each
|
|
respective directory. They have to be built in the following order: <br>
|
|
1. common <br>
|
|
2. i18n <br>
|
|
3. makedata (which invokes makeconv, genrb,
|
|
gencol, genccode etc.)<br>
|
|
4. ctestfw <br>
|
|
5. intltest and cintltst, if you want to run
|
|
the test suite. <br>
|
|
Regarding the test suite, please read the directions in <a href="docs/intltest.html">docs/intltest.html</a>
|
|
and <a href="docs/cintltst.html">docs/cintltst.html</a> </p>
|
|
|
|
<h3>How to Install/Build on Unix</h3>
|
|
|
|
<p>There is a set of Makefiles for Unix which supports Linux w/gcc, Solaris w/gcc and
|
|
Workshop CC, AIX w/xlc and OS/390 with C++.</p>
|
|
|
|
<p>Building IBM Classes for Unicode on Unix requires: </p>
|
|
|
|
<p>A UNIX C++ compiler, (gcc, cc, xlc_r, etc...) installed on the target machine. A recent
|
|
version of GNU make (3.7+). OS/390 gnu utilities for both make (gmake) and zip
|
|
(gzip/gunzip) can be found at the MKS web site at <a href="http://www.mks.com">http://www.mks.com</a>.
|
|
Please do a search on "os/390".</p>
|
|
|
|
<p>The steps are:
|
|
|
|
<ol>
|
|
<li>Unzip the icuXXXX.tar (or icuXXXX.tgz) file.</li>
|
|
<li>Before running the test programs or samples, please set the environment variable <strong>ICU_DATA</strong>,
|
|
the full pathname of the data directory, to indicate where the locale data files and
|
|
conversion mapping tables are. If this variable is not set, the default user data
|
|
directory will be used.</li>
|
|
<li>Change directory to the "icu/source".</li>
|
|
<li>If it is not already set, please set the executable flag for the following files (by
|
|
executing 'chmod +x' command): configure, install.sh and config.*, </li>
|
|
<li>You also need to set other environment variables for different build systems. Use this <a
|
|
href="docs/build_env.htm">table</a> or provided <a href="source/runConfigureICU">script</a>.</li>
|
|
<li>Type "./configure" or type "./configure --help" to print the
|
|
avialable options.</li>
|
|
<li>Type "make" to compile the libraries and all the data files. On OS/390,
|
|
both IEEE binary and native hexadecimal floating point calculations are supported.
|
|
The default is to build with native floating point support. Please set the
|
|
environment variable IEEE390=1 if you would like to make the ICU DLLs with IEEE floating
|
|
point support.</li>
|
|
<li>Optionally, type "make check" to verify the test suite.</li>
|
|
<li>Type "Make install" to install.</li>
|
|
</ol>
|
|
|
|
<p>It is also possible to build each library individually, using the Makefiles in each
|
|
respective directory. They have to be built in the following order: <br>
|
|
1. common <br>
|
|
2. i18n <br>
|
|
3. makeconv <br>
|
|
4. genrb<br>
|
|
5. gencol<br>
|
|
6. gentz<br>
|
|
7. genccode<br>
|
|
8. ctestfw <br>
|
|
9. intltest and cintltst, if you want to run
|
|
the test suite. <br>
|
|
Regarding the test suite, please read the directions in <a href="docs/intltest.html">docs/intltest.html</a>
|
|
and <a href="docs/cintltst.html">docs/cintltst.html</a> </p>
|
|
<a NAME="datahandling">
|
|
|
|
<h1>How ICU handles data</h1>
|
|
</a>
|
|
|
|
<h3><u>How to add a locale data file</u></h3>
|
|
|
|
<p>To add locale data files to IBM Classes for Unicode do the following: </p>
|
|
|
|
<blockquote>
|
|
<p>1. Create a file containing the key-value pairs which value you are overriding from the
|
|
parent locale data file. <br>
|
|
Make sure the filename is the locale ID with the extension
|
|
".txt". We recommend you copy parent file and change the values <br>
|
|
that need to be changed, remove all other key-pairs. Be sure to update
|
|
the locale ID key (the outmost brace) with <br>
|
|
the name of the locale id your a creating.</p>
|
|
</blockquote>
|
|
|
|
<blockquote>
|
|
<p>2. Name the file with locale ID you are creating with a ".txt" at the end.</p>
|
|
</blockquote>
|
|
|
|
<blockquote>
|
|
<blockquote>
|
|
<p>e.g. fr_BF.txt <br>
|
|
Would create a locale that inherits all the key-value pairs from fr.txt.</p>
|
|
</blockquote>
|
|
</blockquote>
|
|
|
|
<blockquote>
|
|
<p>3. Add the name of that file (without the ".txt" extension) as a single line
|
|
in "index.txt" file in the default locale directory (icu/data/).</p>
|
|
<p>4. Regenerate the data DLL file. Please see "<a href="#HowToInstall">How to
|
|
Install</a>" section for more details on how to verify the ICU release.</p>
|
|
</blockquote>
|
|
|
|
<p><a NAME="addrbdatatoapp"></a></p>
|
|
<b><u><font size="+1">
|
|
|
|
<p>How to add resource bundle data to your application</font></u></b> </p>
|
|
|
|
<p>Adding resource bundle data to your application is quite simple: </p>
|
|
|
|
<p>Create resource bundle files with the right format and names in a directory for
|
|
resource bundles you create in your application directory tree.(for more information of
|
|
that format of these files see <a href="../icuhtml/ResourceBundle.html#DOC.DOCU">resource
|
|
bundle documentation</a> or <a
|
|
href="http://www.ibm.com/java/education/international-unicode/unicodec.html">resource
|
|
bundle format)</a>. <br>
|
|
Please note that resource bundle tag names should contain only invariant 7-bit ASCII
|
|
characters (e.g. ones from the following set: A-Z, a-z, 0-9, <SP>, ", %, &,
|
|
`, (, ), *, +, ,, -, ., /, :, ;, <, =, >, ?, _).<br>
|
|
Use that same directory name (absolute path) when instantiating a resource bundle at run
|
|
time.</p>
|
|
|
|
<p><a NAME="WhereCollation"></a></p>
|
|
|
|
<h3><u>Where Collation Data is stored</u></h3>
|
|
|
|
<p>Collation data is stored in a single directory on a local disk. Each locale's data is
|
|
stored in a corresponding ASCII text file indicated by a "CollationElements" tag
|
|
. For instance, the data for de_CH is stored with a tag "CollationElements" in a
|
|
file named "de_CH.txt". Reading the collation data from these files can be
|
|
time-consuming, especially for large pieces of data that occur in languages such as
|
|
Japanese. For this reason, the Collation Framework implements a second file format, a
|
|
performance-optimized, non-portable, binary format. These binary files are generated
|
|
automatically by the framework the first time a collation table is parsed. They have names
|
|
of the form "de_CH.col". Once the files are generated by the framework, future
|
|
loading of those collations occur from the binary file, rather than the text file, at much
|
|
higher speed. </p>
|
|
|
|
<p>In general, you don't have to do anything special with these files. They can be
|
|
generated directly by using the "gencol" tool. In addition, they can also
|
|
be generated and used automatically by the framework, without intervention on your part.
|
|
However, there are situations in which you will have to regenerate them. To do so, you
|
|
must manually delete the ".col" files from your collation data directory and
|
|
re-run the gencol tool.</p>
|
|
|
|
<p>You will need to regenerate your ".col" files in the following circumstances:
|
|
|
|
<ol>
|
|
<li>You are moving your data to another platform. Since the ".col" files are
|
|
non-portable, you must make sure they are regenerated.</li>
|
|
<li><b>DO NOT </b>copy them from one platform to another.</li>
|
|
<li>You have changed the "CollationElements" data in the locale's ".txt"
|
|
file. Note that if you change the default rules for some reason, which underlie all
|
|
collations, then you will have to rebuild ALL your ".col" files, since they all
|
|
are merged with the default rule set.</li>
|
|
</ol>
|
|
|
|
<h3><a NAME="CharsetConvert"></a><u>Character Set Conversion Information</u></h3>
|
|
|
|
<p>The charset conversion library provides ways to convert simple text strings (e.g.,
|
|
char*) such as ISO 8859-1 to and from Unicode. The objective is to provide clean, simple,
|
|
reliable, portable and adaptable data structures and algorithms to support the IBM Classes
|
|
for Unicode's character codeset Conversion APIs. The conversion data in the library
|
|
originated from the NLTC lab in IBM. The IBM character set conversion tables are publicly
|
|
available in the published IBM document called "CHARACTER DATA REPRESENTATION
|
|
ARCHITECTURE - REFERENCE AND REGISTRY". The character set conversion library includes
|
|
single-byte, double-byte and some UCS encodings to and from Unicode. This document can be
|
|
ordered through Mechanicsberg and it comes with 2 CD ROMs which have machine readable
|
|
conversion tables on them. The license agreement is included in IBM Classes for Unicode
|
|
agreement. </p>
|
|
|
|
<p>Click <a href="data/convrtrs.txt">here</a> to view converters implemented in ICU. To
|
|
see converters in action, please visit <a
|
|
href="http://oss.software.ibm.com/developerworks/opensource/icu/localeexplorer/?converter&"><font
|
|
COLOR="#000000" size="3">http://oss.software.ibm.com/developerworks/opensource/icu/localeexplorer/?converter&</font></a></p>
|
|
|
|
<p>To order the document in the US you can call 1-800-879-2755 and request document number
|
|
SC09-2190-00. The cost of this publication is $75.00 US not including tax. </p>
|
|
|
|
<h3><a NAME="ProgrammingNotes"></a><u>Programming Notes</u></h3>
|
|
|
|
<h4><b><u>Reporting Errors</u></b></h4>
|
|
|
|
<p>In order for the code to be portable, only a subset of the C++ language that will
|
|
compile correctly on even the oldest of C++ compilers (and also to provide a usable C
|
|
interface) can be used in the implementation, which means that there's no use the C++
|
|
exception mechanism in the code. </p>
|
|
|
|
<p>After considering many alternatives, the decision was that every function that can fail
|
|
takes an error-code parameter by reference. This is always the last parameter in the
|
|
function’s parameter list. The ErrorCode type is defined as a enumerated type. Zero
|
|
represents no error, positive values represent errors, and negative values represent
|
|
non-error status codes. Macros were provided, SUCCESS and FAILURE, to check the error
|
|
code. </p>
|
|
|
|
<p>The ErrorCode parameter is an input-output parameter. Every function tests the error
|
|
code before doing anything else, and immediately exits if it’s a FAILURE error code.
|
|
If the function fails later on, it sets the error code appropriately and exits without
|
|
doing any other work (except, of course, any cleanup it has to do). If the function
|
|
encounters a non-error condition it wants to signal (such as "encountered an
|
|
unmappable character" in transcoding), it sets the error code appropriately and
|
|
continues. Otherwise, the function leaves the error code unchanged. </p>
|
|
|
|
<p>Generally, only functions that don’t take an ErrorCode parameter, but call
|
|
functions that do, have to declare one. Almost all functions that take an ErrorCode
|
|
parameter and also call other functions that do merely have to propagate the error code
|
|
they were passed down to the functions they call. Functions that declare a new ErrorCode
|
|
parameter must initialize it to ZERO_ERROR before calling any other functions. </p>
|
|
|
|
<p>The rationale here is to allow a function to call several functions (that take error
|
|
codes) in a row without having to check the error code after each one. [A function usually
|
|
will have to check the error code before doing any other processing, however, since it is
|
|
supposed to stop immediately after receiving an error code.] Propagating the error-code
|
|
parameter down the call chain saves the programmer from having to declare one everywhere,
|
|
and also allows us to more closely mimic the C++ exception protocol. </p>
|
|
|
|
<h4><b><u>C Function and Data Type Naming</u></b></h4>
|
|
<b>
|
|
|
|
<p>Function names.</b> If a function is identical (or almost identical) to an ANSI or
|
|
POSIX function, we give it the same name and (as much as possible) the same parameter
|
|
list. A "u" is prepended onto the beginning of the name. </p>
|
|
|
|
<p>For functions that exist prior to version 1.2.1, that the function name should begin
|
|
with a lower-case "u". After the "u" is a short code identifying the
|
|
subsystem it belongs to (e.g., "loc", "rb", "cnv",
|
|
"coll", etc.). This code is separated from the actual function name by an
|
|
underscore, and the actual function name can be anything. For example, </p>
|
|
|
|
<blockquote>
|
|
<pre><font size="-1">UChar* uloc_getLanguage(...);
|
|
void uloc_setDefaultLocale(...);
|
|
UChar* ures_getString(...);</font></pre>
|
|
</blockquote>
|
|
|
|
<p><b>Struct and enum type names.</b> For structs and enum types, the rule is that their
|
|
names begin with a capital "U." There is no underscore for struct names.</p>
|
|
|
|
<pre><font size="-1" face="Courier New"> UResourceBundle;
|
|
UCollator;
|
|
UCollationResult;</font></pre>
|
|
<b>
|
|
|
|
<p>Enum value names.</b> Enumeration values have names that begin with "UXXX"
|
|
where XXX stands for the name of the functional category.</p>
|
|
|
|
<blockquote>
|
|
<pre><font size="-1" face="Courier New">UNUM_DECIMAL;
|
|
UCOL_GREATER;</font></pre>
|
|
</blockquote>
|
|
<b>
|
|
|
|
<p>Macro names.</b> Macro names are in all caps, but there are currently no other
|
|
requirements. </p>
|
|
|
|
<p><b>Constant names.</b> Many constant names (constants defined with "const",
|
|
not macros defined with "#define" that are used as constants) begin with a
|
|
lowercase k, but this isn’t universally enforced. </p>
|
|
|
|
<h4><b><u>Preflighting and Overflow Handling</u></b></h4>
|
|
|
|
<p>In ICU's C APIs, the user needs to adhere to the following principles for consistency
|
|
across all functional categories:
|
|
|
|
<ol>
|
|
<li>All the Unicode string processing should be expressed in terms of a UChar* buffer that
|
|
is always null terminated.</li>
|
|
<li>The APIs assume that the input string parameters are statically allocated fix-sized
|
|
character buffers.</li>
|
|
<li>When the value a function is going to return is already stored as a constant value in
|
|
static space (e.g., it’s coming from a fixed table, or is stored in a cache), the
|
|
function will just return the const UChar* pointer.</li>
|
|
<li>When the function can’t return a UChar* to storage the user doesn’t have to
|
|
delete, the caller needs to pass in a pointer to a character buffer that the function can
|
|
fill with the result. This pointer needs to be accompanied by a int32_t parameter that
|
|
gives the size of the buffer.</li>
|
|
</ol>
|
|
|
|
<p>To find out how large the result buffer should be, ICU provides a <strong>preflighting</strong>
|
|
C interface. The interface works like this:
|
|
|
|
<ol>
|
|
<li>When using the "<b>preflighting</b>" option: you need to pass the function a
|
|
NULL pointer for the buffer pointer, and the function returns the actual size of the
|
|
result. You can then choose to allocate a buffer of the correct size and re-run the
|
|
operation if you would like to.</li>
|
|
<li>After allocating a buffer of some reasonable size on the stack and passes that to the
|
|
function, if the result can fit in that buffer, everything works fine. If the result
|
|
doesn’t fit, the function will return the actual size needed. You can then
|
|
allocate a buffer of the correct size on the heap and try calling the same function again.</li>
|
|
<li>Now you have created a buffer of some reasonable size on the stack and passes it to the
|
|
function. If you don't care about the completeness of the result and the allocated
|
|
buffer is too small, you can continue on using the truncated result.</li>
|
|
</ol>
|
|
|
|
<p>The following three options demonstrates how to use the preflighting interface, </p>
|
|
|
|
<blockquote>
|
|
<pre><font size="-1"><font face="Courier New">/**
|
|
</font> * @param result is a pointer to where the actual result will be.
|
|
* @param maxResultSize is the number of characters the buffer pointed to be result has room for.
|
|
* @return The actual length of the result (counting the terminating null)
|
|
*/
|
|
int32_t doSomething( /* input params */, UChar* result,
|
|
int32_t maxResultSize,<font
|
|
face="Courier New"> UErrorCode* err);</font></font></pre>
|
|
</blockquote>
|
|
|
|
<p>In this sample, if the actual result doesn’t fit in the space available in <font
|
|
size="-1" face="Courier New">maxResultSize</font>, this function returns the amount of
|
|
space necessary to hold the result, and result holds as many characters of the actual
|
|
result as possible. If you don’t care about this, no further action is necessary. If
|
|
you <i>do </i>care about the truncated characters, you can then allocate a buffer on the
|
|
heap of the size specified by the return value and call the function again, passing <i>that
|
|
</i>buffer’s address for result. </p>
|
|
|
|
<p>All preflighting functions have a fill-in <font size="-1" face="Courier New">ErrorCode</font>
|
|
parameter (and follow the normal <font size="-1" face="Courier New">ErrorCode</font>
|
|
rules), even if they are not currently doing so. Buffer overflow would be treated as a
|
|
FAILURE error condition, but would <i>not</i> be reported when the caller passes in NULL
|
|
for <font size="-1" face="Courier New">actualResultSize</font> (presumably, a NULL for
|
|
this parameter means the client doesn’t care if he got a buffer overflow). All other
|
|
failing error conditions will overwrite the "buffer overflow" error, e.g. <font
|
|
face="Courier New">MISSING_RESOURCE_ERROR</font> etc..</p>
|
|
|
|
<h4><b><u>Arrays as return types</u></b></h4>
|
|
|
|
<p>Returning an array of strings is fairly easy in C++, but very hard in C. Instead of
|
|
returning the array pointer directly, we opted for an iterative interface instead: split
|
|
the function into two functions. One returns the number of elements in the array,
|
|
and the other one returns a single specified element from the array.</p>
|
|
|
|
<blockquote>
|
|
<pre><font size="-1" face="Courier New">int32_t countArrayItems(/* params */);
|
|
int32_t getArrayElement(int32_t elementIndex, /* other params */,
|
|
UChar* result, int32_t maxResultSize, UErrorCode* err);</font></pre>
|
|
</blockquote>
|
|
|
|
<p>In this case, iterating across all the elements in the array would amount to a call to
|
|
the count() function followed by multiple calls to the getElement() function. </p>
|
|
|
|
<blockquote>
|
|
<pre><font size="-1" face="Courier New">for (i = 0; i < countArrayItems(...); i++) {
|
|
UChar element[50];
|
|
getArrayItem(i, ..., element, 50, &err);
|
|
/* do something with element */
|
|
}</font></pre>
|
|
</blockquote>
|
|
|
|
<p>In the case of the resource bundle <font face="Courier New">ures_XXXX</font> functions
|
|
returning 2-dimensional arrays, the getElement() function takes both x and y coordinates
|
|
for the desired element, and the count() function returns the number of arrays (x axis).
|
|
Since the size of each array element in the resource 2-D arrays should always be
|
|
the same, this provides an easy-to-use C interface. </p>
|
|
|
|
<blockquote>
|
|
<pre><font size="-1" face="Courier New">void countArrayItems(int32_t* rows, int32_t* columns,
|
|
/* other params */);
|
|
|
|
int32_t get2dArrayElement(int32_t rowIndex,
|
|
int32_t colIndex,
|
|
/* other params */,
|
|
UChar* result,
|
|
int32_t maxResultSize,
|
|
UErrorCode* err);</font></pre>
|
|
</blockquote>
|
|
|
|
<h3><a NAME="WhereToFindMore"></a><u>Where to Find More Information</u></h3>
|
|
|
|
<p><a href="http://www.ibm.com/java/tools/international-classes/">http://www.ibm.com/java/tools/international-classes/</a>
|
|
is a pointer to general information about the IBM Classes For Unicode. </p>
|
|
|
|
<p><a href="docs/udata.html">docs\udata.html</a> is a raw draft of ICU data handling.</p>
|
|
|
|
<p><a href="../icuhtml/aindex.html">html/aindex.html</a> is an alphabetical index to
|
|
detailed API documentation. <br>
|
|
<a href="../icuhtml/HIERjava.html">html/HIERjava.html</a> is a hierarchical index to
|
|
detailed API documentation. </p>
|
|
|
|
<p><a href="docs/collate.html">docs\collate.html</a> is an overview to Collation. </p>
|
|
|
|
<p><a href="docs/BreakIterator.html">docs\BreakIterator.html</a> is a diagram showing how
|
|
BreakIterator processes text elements. </p>
|
|
|
|
<p><a href="http://www.ibm.com/java/education/international-unicode/unicode1.html">http://www.ibm.com/java/education/international-unicode/unicode1.html</a>
|
|
is a pointer to information on how to make applications global. <br>
|
|
</p>
|
|
|
|
<h3><a NAME="SubmittingComments"></a><u>Submitting Comments, Requesting Features and
|
|
Reporting Bugs</u></h3>
|
|
|
|
<p>To submit comments, request features and report bugs, please contact us. While we
|
|
are not able to respond individually to each comment, we do review all comments. Send
|
|
Internet email to <a href="mailto:icu4c@us.ibm.com">icu4c@us.ibm.com.</a> <br>
|
|
</p>
|
|
|
|
<hr>
|
|
|
|
<p>Copyright © 1997-2000 International Business Machines Corporation an others. All
|
|
Rights Reserved.<br>
|
|
IBM Center for Java Technology Silicon Valley, <br>
|
|
10275 N De Anza Blvd., Cupertino, CA 95014 <br>
|
|
All rights reserved. </p>
|
|
|
|
<hr>
|
|
</body>
|
|
</html>
|