Release Date (alpha)
Nov 09, 2004
Note: this file has not been completely
updated yet. It will be updated for the final 3.2 release.
For the most recent release, see the ICU4J download site.
The International Components for Unicode (ICU) library provides robust and full-featured Unicode services on a wide variety of platforms. ICU supports the most current version of the Unicode standard, including support for supplementary characters (needed for GB 18030 repertoire support).
Java provides a strong foundation for global programs, and IBM and the ICU team played a key role in providing globalization technology to Java. But because of its long release schedule, Java cannot always keep up with evolving standards. The ICU team continues to extend Java's Unicode and internationalization support, focusing on improving performance, keeping current with the Unicode standard, and providing richer APIs, while remaining as compatible as possible with the original Java text and internationalization API design.
ICU4J is an add-on to the regular JVM that provides:
Note: We continue to provide assistance to Sun, and in some cases, ICU4J support has been rolled into a later release of Java. For example, the Thai word-break is now in Java 1.4. However, the most current and complete version is always found in ICU4J.
A complete report of the API changes between version 3.2 and version 3.0 of ICU4J can be found here. This report is generated by a tool and has some limitations, the most notable of which is that it does not properly reflect the effect of class inheritance changes. Also of course, being generated by a tool, the report does not provide explanation or commentary on the changes. For background information and clarification of changes it's always recommended that you check the mailing list and archives.
RFC 3066 defines a new format for Locale identifiers that
incorporates information about the script as well as the language and
region into the locale identifier. ICU4J has enhanced the
ULocale class to provide this information. All ICU4J APIs that
work with
Locale have been overloaded to also work with ULocale.
ULocale is now the preferred API for specifying a locale ID to
ICU4J APIs.
More resource data has been moved out of the core resources into
separate resource trees. This will make it easier for clients to
trim the data used by ICU4J.
The ICU projects (ICU4C and ICU4J) use the X license. The X license is a non-viral and recommended free software license that is compatible with the GNU GPL license. This became effective with release 1.8.1 of ICU4C and release 1.3.1 of ICU4J in mid-2001. All new ICU releases will adopt the X license; previous ICU releases continue to utilize the IPL (IBM Public License). Users of previous releases of ICU who want to adopt new ICU releases will need to accept the terms and conditions of the X license.
The main effect of the change is to provide GPL compatibility. The X license is listed as GPL compatible, see the GNU page at http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses. This means that GPL projects can now use ICU code, it does not mean that projects using ICU become subject to GPL.
The text of the X license is available at http://www.x.org/terms.htm. The IBM version contains the essential text of the license, omitting the X-specific trademarks and copyright notices. The full copy of ICU's license is included in the download package.
For more details please see the press announcement and the Project FAQ.
Parts of ICU4J depend on functionality that is only available in JDK 1.4 or later, although some components work under earlier JVMs. All components should be compiled using a Java2 compiler, as even components that run under earlier JVMs can require language features that are only present in Java2. Currently 1.1.x, 1.2.x and 1.3.x JVMs are unsupported and untested, and you use the components on these JVMs at your own risk.
The reference platforms which we support and test ICU4J on are:
Please use the most recent updates of the supported JDK versions.
Additionally, we have built and tested ICU4J on the following unsupported platforms:
There are two ways to download the ICU4J releases.
export CVSROOT=:pserver:anoncvs@oss.software.ibm.com:/usr/cvs/icu4j
cvs login CVS password: anoncvs
cvs checkout icu4j
cvs logout
For more details on how to download ICU4J directly from the web site, please also see http://oss.software.ibm.com/icu4j/download/index.html
Below, $Root is the placement of the icu directory in your file system, like "drive:\...\icu4j" in your environment. "drive:\..." stands for any drive and any directory on that drive that you chose to install icu4j into.
Information and build files:
readme.html (this file) |
A description of ICU4J (International Components for Unicode for Java) |
license.html | The X license, used by ICU4J |
---|---|
build.xml | Ant build file. See How to Install and Build for more information |
The source directories mirror the package structure of the code.
Core packages become part of the ICU4J jar
file.
API packages contain classes with supported
API.
RichText classes are Core and API, but can be
removed from icu4j.jar, and can be built into their own jar.
$Root/src/com/ibm/icu/dev Non-Core, Non-API |
Packages used for
internal development:
|
---|---|
$Root/src/com/ibm/icu/impl Core, Non-API |
These are utility classes used from different ICU4J core packages. |
$Root/src/com/ibm/icu/lang Core, API |
Character properties package. |
$Root/src/com/ibm/icu/math Core, API |
Additional math classes. |
$Root/src/com/ibm/icu/text Core, API |
Additional text classes.
These add to, and in some cases replace, related core Java classes:
|
$Root/src/com/ibm/icu/util Core, API |
Additional utility
classes:
|
$Root/src/com/ibm/richtext RichText |
Styled text editing package. This includes demos, tests, and GUIs for editing and displaying styled text. The richtext package provides a scrollable display, typing, arrow-key support, tabs, alignment and justification, word- and sentence-selection (by double-clicking and triple-clicking, respectively), text styles, clipboard operations (cut, copy and paste) and a log of changes for undo-redo. Richtext uses Java's TextLayout and complex text support (provided to Sun by the ICU4J team). |
Building ICU4J creates and populates the following directories:
$Root/classes | contains all class files |
---|---|
$Root/doc | contains JavaDoc for all packages |
ICU4J data is stored in the following locations:
com.ibm.icu.impl.data |
Holds data used by the
ICU4J core packages (com.ibm.icu.lang , com.ibm.icu.text ,
com.ibm.icu.util , com.ibm.icu.math and
com.ibm.icu.text ). In particular, all resource
information is stored here. |
---|---|
com.ibm.icu.dev.data |
Holds data that is not part of ICU4J core, but rather part of a test, sample, or demo. |
The ICU user's guide contains lots of general information about ICU, in its C, C++, and Java incarnations.
The complete API documentation for ICU4J (javadoc) is available on the ICU4J web site, and can be built from the sources:
To install ICU4J, simply place the prebuilt jar file icu4j.jar on your Java CLASSPATH. No other files are needed.
Eclipse users: See the ICU4J site for information on how to configure Eclipse to build ICU4J.
To build ICU4J, you will need a Java2 JDK and the Ant build system. We strongly recommend using the Ant build system to build ICU4J:
Next install the Ant build system. Ant is a portable, Java-based build system similar to make. ICU4J uses Ant because it introduces no other dependencies, it's portable, and it's easier to manage than a collection of makefiles. We currently build ICU4J using a single makefile on both Windows 9x and Linux using Ant. The build system requires Ant 1.5 or later.
Installing Ant is straightforward. Download it (see http://ant.apache.org/bindownload.cgi), extract it onto your system, set some environment variables, and add its bin directory to your path. For example:
set JAVA_HOME=C:\jdk1.4.2
set ANT_HOME=C:\ant
set PATH=%PATH%;%ANT_HOME%\bin
See the current Ant documentation for details.
Once the JDK and Ant are installed, building is just a matter of typing ant in the ICU4J root directory. This causes the Ant build system to perform a build as specified by the file build.xml, located in the ICU4J root directory. You can give Ant options like -verbose, and you can specify targets. Ant will only build what's been changed and will resolve dependencies properly. For example:
F:\icu4j>ant tests
Buildfile: build.xml
Project base dir set to: F:\icu4j
Executing Target: core
Compiling 71 source files to F:\icu4j\classes
Executing Target: tests
Compiling 24 source files to F:\icu4j\classes
Completed in 19 seconds
The following are some targets that you can provide to ant. For more targets, see the build.xml file:
all | Build all targets. |
---|---|
core | Build the main class files in the subdirectory classes. If no target is specified, core is assumed. |
tests | Build the test class files. |
demos | Build the demos. |
tools | Build the tools. |
docs | Run javadoc over the main class files, generating an HTML documentation tree in the subdirectory doc. |
jar | Create a jar archive icu4j.jar in the root ICU4J directory containing the main class files. |
jarSrc | Like the jar target, but containing only the source files. |
jarDocs | Like the jar target, but containing only the docs. |
richedit | Build the richedit core class files and tests. |
richeditJar | Create the richedit jar file (which contains only the richedit core class files). The file richedit.jar will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten. |
richeditZip | Create a zip archive of the richedit docs and jar file for distribution. The zip file richedit.zip will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten. |
clean | Remove all built targets, leaving the source. |
For more information, read the Ant documentation and the build.xml file.
After doing a build it is a good idea to run all the icu4j tests by
typing
"java
-classpath $Root/classes
-DUnicodeData=$Root/src/com/ibm/icu/dev/data/unicode
com.ibm.icu.dev.test.TestAll -nothrow".
(If you are allergic to build systems, as an alternative to using Ant you can build by running javac and javadoc directly. This is not recommended. You may have to manually create destination directories.)
Some clients may not wish to ship all of ICU4J with their application, since the application might only use a small part of ICU4J. ICU4J release 2.6 and later provide build options to build individual ICU4J 'modules' for a more compact distribution. The modules are based on a service and the APIs that define it, e.g., the normalizer module supports all the APIs of the Normalizer class (and some others). Tests can be run to verify that the APIs supported by the module function correctly. Because of internal code dependencies, a module contains extra classes that are not part of the module's core service API. Some or most of the APIs of these extra classes will not work. Only the module's core service API is guaranteed. Other APIs may work partially or not at all, so client code should avoid them.
Individual modules are not built directly into their own separate jar files. Since their dependencies often overlap, using separate modules to 'add on' ICU4J functionality would result in unwanted duplication of class files. Instead, building a module causes a subset of ICU4J's classes to be built and put into ICU4J's standard build directory. After one or more module targets are built, the 'moduleJar' target can then be built, which packages the class files into a 'module jar.' Other than the fact that it contains fewer class files, little distinguishes this jar file from a full ICU4J jar file, and in fact they share the same name.
Currently ICU4J can be divided into the following modules:
Key:
Module Name | Ant Targets | Test Package Supported | Size‡ | ||
---|---|---|---|---|---|
|
Modules:
Normalizer | normalizer, normalizerTests | com.ibm.icu.dev.test.normalizer | 434 KB | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|||||||||||
Collator | collator, collatorTests | com.ibm.icu.dev.test.collator | 1,473 KB | ||||||||
|
|||||||||||
Calendar | calendar, calendarTests | com.ibm.icu.dev.test.calendar | 1,490 KB | ||||||||
|
|||||||||||
BreakIterator | breakIterator, breakIteratorTests | com.ibm.icu.dev.test.breakiterator | 1,448 KB | ||||||||
|
|||||||||||
Basic Properties | propertiesBasic, propertiesBasicTests | com.ibm.icu.dev.test.lang | 506 KB | ||||||||
|
|||||||||||
Full Properties | propertiesFull, propertiesFullTests | com.ibm.icu.dev.test.lang | 1,399 KB | ||||||||
|
|||||||||||
Formatting | format, formatTests | com.ibm.icu.dev.test.format | 2,426 KB | ||||||||
|
|||||||||||
StringPrep, IDNA | stringPrep, stringPrepTests | com.ibm.icu.dev.test.stringprep | 456 KB | ||||||||
|
|||||||||||
Transforms | transliterator, transliteratorTests | com.ibm.icu.dev.test.translit | 1,482 KB | ||||||||
|
Building any of these modules is as easy as specifying a build
target to the Ant build system, e.g:
To build a module that contains only the Normalizer API:
ant normalizer
ant moduleJar
ant normalizerTests
java -classpath $icu4j_root/classes
com.ibm.icu.dev.test.TestAll -nothrow -w
ant normalizer collator
ant moduleJar
ant normalizerTests collatorTests
java -classpath $icu4j_root/classes
com.ibm.icu.dev.test.TestAll -nothrow -w
ant -projecthelp
Note: the demos provided with ICU4J are for the most part undocumented. This list can show you where to look, but you'll have to experiment a bit. The demos (with the exception of richedit) are unsupported and may change or disappear without notice.
The icu4j.jar file contains only the core ICU4J classes, not the demo classes, so unless you build ICU4J there is little to try out.
java -jar $Root/richedit/richedit.jarThis will present an empty edit pane with an awt interface.
With a fuller command line you can try out other options, for
example:
java -classpath $Root/richedit/richedit.jar
com.ibm.richtext.demo.EditDemo [-swing][file]
This will use an awt GUI, or a swing GUI if
-swing is passed on the command line. It will open a text
file if one is provided, otherwise it will open a blank page. Click
to type.
You can add tabs to the tab ruler by clicking in the ruler while holding down the control key. Clicking on an existing tab changes between left, right, center, and decimal tabs. Dragging a tab moves it, dragging it off the ruler removes it.
You can experiment with complex text by using the keymap functions. Please note that these are mainly for demo purposes, for real work with Arabic or Hebrew you will want to use an input method. You will need to use a font that supports Arabic or Hebrew, 'Lucida Sans' (provided with Java) supports these languages.
The other demo programs are not supported and exist only to let you experiment with the ICU4J classes. First, build ICU4J using ant all. Then try one of the following:
By default the ICU4J distribution includes all of the standard resource information. It is located under the directory com/ibm/icu/impl/data. Depending on the service, the data is in different locations and in different formats. Note: This will continue to change from release to release, so clients should not depend on the exact organization of the data in ICU4J.
com.ibm.icu.util.ULocale
class, and the use of these
names in searching for resources is documented in java.util.ResourceBundle
.
Some of the data files alias or otherwise reference data from other data files. One reason for this is because some locale names have changed. For example, he_IL used to be iw_IL. In order to support both names but not duplicate the data, one of the resource files refers to the other file's data. In other cases, a file may alias a portion of another file's data in order to save space. Currently ICU4J provides no tool for revealing these dependencies.
Note: Java's Locale
class
silently converts the language code "he" to "iw"
when you construct the Locale. Thus
Java cannot be used to locate resources that use the "he"
language code. ICU, on the other hand, does not perform this
conversion in ULocale, and instead uses aliasing in the locale data to
represent the same set of data under different locale
ids.
Resource files that use locale ids form a hierarchy, with up to four
levels: a root, language, region (country), and variant. Searches for
locale data attempt to match as far down the hierarchy as possible,
for example, "he_IL" will match he_IL, but
"he_US" will match he (since there is no US
variant for he, and "xx_YY will match root (the
default fallback locale) since there is no xx language code
in the locale hierarchy. Again, see
java.util.ResourceBundle
for more information.
Currently ICU4J provides no tool for revealing these dependencies between data files, so trimming the data directly in the ICU4J project is a hit-or-miss affair. The key point when you remove data is to make sure to remove all dependencies on that data as well. For example, if you remove he.res, you need to remove he_IL.res, since it is lower in the hierarchy, and you must remove iw.res, since it references he.res, and iw_IL.res, since it depends on it (and also references he_IL.res).
Unfortunately, the jar tool in the JDK provides no way to remove items from a jar file. Thus you have to extract the resources, remove the ones you don't want, and then create a new jar file with the remining resources. See the jar tool information for how to do this. Before 'rejaring' the files, be sure to thoroughly test your application with the remaining resources, making sure each required resource is present.
Warning: Resource file formats can change across releases of ICU4J! The format of ICU4J resources is not part of the API. Clients who develop their own resources for use with ICU4J should be prepared to regenerate them when they move to new releases of ICU4J.
ICU4J 3.0's resource mechanism is new for this release and we are still developing it. Currently it is not possible to mix icu's new binary .res resources with traditional java-style .class or .txt resources. We might allow for this in a future release, but since the resource data and format is not formally supported, you run the risk of incompatibilities with future releases of ICU4J.
Resource data in ICU4J is checked in to the repository as a jar file containing the resource binaries, icudata.jar. This means that inspecting the contents of these resources is difficult. They currently are compiled from ICU4C .txt file data. You can view the contents of the ICU4C text resource files to understand the contents of the ICU4J resources.
The files in icudata.jar get extracted to com/ibm/icu/impl/data in the build directory when the 'core' target is built. Thereafter, as long as the file res_index.res file is untouched, they will not be extracted again. Building the 'resources' target will force the resources to once again be extracted. Extraction will overwrite any corresponding resource files already in that directory.
http://oss.software.ibm.com/icu4j/ is a pointer to general information about the International Components for Unicode in Java
http://www.ibm.com/developer/unicode is a pointer to information on how to make applications global.
Your comments are important to making ICU4J successful. We are committed to fixing any bugs, and will use your feedback to help plan future releases.
To submit comments, request features and report bugs, contact us
through the ICU4J
mailing list.
While we are not able to respond individually to each comment, we do
review all comments.
Copyright © 2002-2004 International Business
Machines Corporation and others. All Rights
Reserved.
5600 Cottle Road, San José, CA 95193