Release Date
November 3rd, 2003
For the most recent release, see the ICU4J download site.
Today's global market demands programs that support a wide variety of languages and national conventions. Customers prefer software and web pages tailored to their needs, studies confirm that this leads to increased sales. Java provides a strong foundation for global programs, and IBM and the ICU4J team played a key role in providing globalization technology to Sun for use in Java.
But Java does not yet provide all the features that some products require. ICU4J is an add-on library that extends Java's globalization technology by providing the following tools:
Produces canonical text representations, needed for XML and the net.
Required for correct presentation of dates in some countries.
Enhances standard Java number formatting. The spelled-out format is used for checks and similar documents.
Required for correct support of Thai.
Suitable for large numbers of small fields, where LZW and similar schemes do not apply.
For fast multilingual string comparison
The ICU projects (ICU4C and ICU4J) use the X license. The X license is a non-viral and recommended free software license that is compatible with the GNU GPL license. This became effective with release 1.8.1 of ICU4C and release 1.3.1 of ICU4J in mid-2001. All new ICU releases will adopt the X license; previous ICU releases continue to utilize the IPL (IBM Public License). Users of previous releases of ICU who want to adopt new ICU releases will need to accept the terms and conditions of the X license.
The main effect of the change is to provide GPL compatibility. The X license is listed as GPL compatible, see the gnu page at http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses. This means that GPL projects can now use ICU code, it does not mean that projects using ICU become subject to GPL.
The text of the X license is available at http://www.x.org/terms.htm. The IBM version contains the essential text of the license, omitting the X-specific trademarks and copyright notices. The full copy of ICU's license is included in the download package.
For more details please see the press announcement and the Project FAQ.
Parts of ICU4J depend on functionality that is only available in JDK 1.3 or later, although some components work under earlier JVMs. All components should be compiled using a Java2 compiler, as even components that run under earlier JVMs can require language features that are only present in Java2. Currently 1.1.x and 1.2.x JVMs are unsupported and untested, and you use the components on these JVMs at your own risk.
The reference platforms which we support and test ICU4J on are:
Please use the most recent updates of the supported JDK versions.
Additionally, we have built and tested ICU4J on the following unsupported platforms:
There are two ways to download the ICU4J releases.
export CVSROOT=:pserver:anoncvs@oss.software.ibm.com:/usr/cvs/icu4j
cvs login CVS password: anoncvs
cvs checkout icu4j
cvs logout
For more details on how to download ICU4J directly from the web site, please also see http://oss.software.ibm.com/icu4j/download/index.html
Below, $Root is the placement of the icu directory in your file system, like "drive:\...\icu4j" in your environment. "drive:\..." stands for any drive and any directory on that drive that you chose to install icu4j into.
Information and build files:
readme.html (this file) |
A description of ICU4J (International Components for Unicode for Java) |
releasenotes.html | A description of features and changes in this and prior releases of ICU4J |
license.html | The X license, used by ICU4J |
build.bat | A convenience bat file for building ICU4J with Ant on Windows |
build.sh | A convenience sh file for building ICU4J with Ant on Unix |
build.xml | Ant build file. See How to Install and Build for more information |
The source directories mirror the package structure of the code.
Core packages become part of the ICU4J jar file.
API packages contain classes with supported API.
RichText classes are Core and API, but can be removed from icu4j.jar, and can be built into their own jar.
$Root/src/com/ibm/icu/dev Non-Core, Non-API |
Packages used for internal development:
|
$Root/src/com/ibm/icu/impl Core, Non-API |
These are utility classes used from different ICU4J core packages. |
$Root/src/com/ibm/icu/lang Core, API |
Character properties package. |
$Root/src/com/ibm/icu/math Core, API |
Additional math classes. |
$Root/src/com/ibm/icu/text Core, API |
Additional text classes. These add to, and in some cases replace, related core Java classes:
|
$Root/src/com/ibm/icu/util Core, API |
Additional utility classes:
|
$Root/src/com/ibm/richtext RichText |
Styled text editing package. This includes demos, tests, and GUIs for editing and displaying styled text. The richtext package provides a scrollable display, typing, arrow-key support, tabs, alignment and justification, word- and sentence-selection (by double-clicking and triple-clicking, respectively), text styles, clipboard operations (cut, copy and paste) and a log of changes for undo-redo. Richtext uses Java's TextLayout and complex text support (provided to Sun by the ICU4J team). |
Building ICU4J creates and populates the following directories:
$Root/classes | contains all class files |
$Root/doc | contains JavaDoc for all packages |
Data organization:
ICU4J data is stored in the following locations:
com.ibm.icu.impl.data |
Holds data used by the ICU4J core packages (com.ibm.icu.lang , com.ibm.icu.text , com.ibm.icu.util , com.ibm.icu.math and com.ibm.icu.text ). In particular, all resource information is stored here. |
com.ibm.icu.dev.data |
Holds data that is not part of ICU4J core, but rather part of a test, sample, or demo. |
The ICU user's guide contains lots of general information about ICU, in its C, C++, and Java incarnations.
The complete API documentation for ICU4J (javadoc) is available on the ICU4J web site, and can be built from the sources:
To install ICU4J, simply place the prebuilt jar file icu4j.jar on your Java CLASSPATH. No other files are needed.
To build ICU4J, you will need a Java2 JDK and the Ant build system. We strongly recommend using the Ant build system to build ICU4J:
Next install the Ant build system. Ant is a portable, Java-based build system similar to make. ICU4J uses Ant because it introduces no other dependencies, it's portable, and it's easier to manage than a collection of makefiles. We currently build ICU4J using a single makefile on both Windows 9x and Linux using Ant. The build system requires Ant 1.5 or later.
Installing Ant is straightforward. Download it (see http://ant.apache.org/bindownload.cgi), extract it onto your system, set some environment variables, and add its bin directory to your path. For example:
set JAVA_HOME=C:\jdk1.3.1 set ANT_HOME=C:\ant set PATH=%PATH%;%ANT_HOME%\bin
See the current Ant documentation for details.
Once the JDK and Ant are installed, building is just a matter of typing ant in the ICU4J root directory. This causes the Ant build system to perform a build as specified by the file build.xml, located in the ICU4J root directory. You can give Ant options like -verbose, and you can specify targets. Ant will only build what's been changed and will resolve dependencies properly. For example:
F:\icu4j>ant tests Buildfile: build.xml Project base dir set to: F:\icu4j Executing Target: core Compiling 71 source files to F:\icu4j\classes Executing Target: tests Compiling 24 source files to F:\icu4j\classes Completed in 19 seconds
The following are some targets that you can give after ant. For more targets, see the build.xml file:
all | Build all targets. |
core | Build the main class files in the subdirectory classes. If no target is specified, core is assumed. |
tests | Build the test class files. |
demos | Build the demos. |
tools | Build the tools. |
docs | Run javadoc over the main class files, generating an HTML documentation tree in the subdirectory doc. |
jar | Create a jar archive icu4j.jar in the root ICU4J directory containing the main class files. |
zip | Create a zip archive of the source, docs, and jar file for distribution. The zip file icu4jYYYYMMDD.zip will be created in the directory above the root ICU4J directory, where YYYYMMDD is today's date. Any existing file of that name will be overwritten. |
zipsrc | Like the zip target, without the docs and the jar file. The zip file icu4jsrcYYYYMMDD.zip will be created in the directory above the root ICU4J directory. |
richedit | Build the richedit core class files and tests. |
richeditJar | Create the richedit jar file (which contains only the richedit core class files). The file richedit.jar will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten. |
richeditZip | Create a zip archive of the richedit docs and jar file for distribution. The zip file richedit.zip will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten. |
clean | Remove all built targets, leaving the source. |
For more information, read the Ant documentation and the build.xml file.
After doing a build it is a good idea to run all the icu4j tests by typing
"java
-classpath $Root/classes -DUnicodeData=$Root/src/com/ibm/icu/dev/data/unicode com.ibm.dev.test.TestAll".
(If you are allergic to build systems, as an alternative to using Ant you can build by running javac and javadoc directly. This is not recommended. You may have to manually create destination directories.)
Module Name | Ant Targets | Test Package Supported | Main Classes*† | Size |
---|---|---|---|---|
Normalizer | normalizer, normalizerTests | com.ibm.icu.dev.test.normalizer |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript text.Normalizer text.Replaceable text.ReplaceableString text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeFilter text.UnicodeMatcher text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.ValueIterator util.VersionInfo |
355 KB |
Collator | collator, collatorTests | com.ibm.icu.dev.test.collator |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript text.BreakDictionary text.BreakIterator text.BreakIteratorFactory text.CanonicalIterator text.CollationElementIterator text.CollationKey text.CollationParsedRuleBuilder text.CollationRuleParser text.Collator text.CollatorReader text.DictionaryBasedBreakIterator text.Normalizer text.Replaceable text.ReplaceableString text.RuleBasedBreakIterator text.RuleBasedCollator text.SearchIterator text.StringSearch text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeFilter text.UnicodeMatcher text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.ValueIterator util.VersionInfo |
1,716 KB |
Calendar | calendar, calendarTests | com.ibm.icu.dev.test.calendar |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript math.BigDecimal text.BreakIterator text.ChineseDateFormat text.ChineseDateFormatSymbols text.DateFormat text.DateFormatSymbols text.DecimalFormat text.DecimalFormatSymbols text.Normalizer text.NumberFormat text.Replaceable text.ReplaceableString text.SimpleDateFormat text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeFilter text.UnicodeMatcher text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.BuddhistCalendar util.Calendar util.CalendarAstronomer util.CalendarCache util.CalendarFactory util.ChineseCalendar util.Currency util.DateRule util.EasterHoliday util.EasterRule util.GregorianCalendar util.HebrewCalendar util.HebrewHoliday util.Holiday util.IslamicCalendar util.JapaneseCalendar util.RangeDateRule util.SimpleDateRule util.SimpleHoliday util.SimpleTimeZone util.SimpleTimeZoneAdapter util.TimeZone util.TimeZoneData util.ValueIterator util.VersionInfo |
1,160 KB |
BreakIterator | breakIterator, breakIteratorTests | com.ibm.icu.dev.test.breakiterator |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript text.BreakDictionary text.BreakIterator text.BreakIteratorFactory text.DictionaryBasedBreakIterator text.Normalizer text.Replaceable text.ReplaceableString text.RuleBasedBreakIterator text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeFilter text.UnicodeMatcher text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.RangeValueIterator util.ValueIterator util.VersionInfo |
1,105 KB |
Basic Properties | propertiesBasic, propertiesTests | com.ibm.icu.dev.test.lang |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript lang.UScriptRun text.BreakDictionary text.BreakIterator text.BreakIteratorFactory text.DictionaryBasedBreakIterator text.Normalizer text.Replaceable text.ReplaceableString text.RuleBasedBreakIterator text.SymbolTable text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeFilter text.UnicodeMatcher text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.CompactByteArray util.RangeValueIterator util.ValueIterator util.VersionInfo |
350 KB |
Full Properties | propertiesFull,
propertiesTests |
com.ibm.icu.dev.test.lang |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript lang.UScriptRun text.BreakDictionary text.BreakIterator text.BreakIteratorFactory text.DictionaryBasedBreakIterator text.Normalizer text.Replaceable text.ReplaceableString text.RuleBasedBreakIterator text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeFilter text.UnicodeMatcher text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.RangeValueIterator util.ValueIterator util.VersionInfo |
986 KB |
Formatting | format, formatTests | com.ibm.icu.dev.test.format |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript math.BigDecimal text.BreakIterator text.CanonicalIterator text.ChineseDateFormat text.ChineseDateFormatSymbols text.CollationElementIterator text.CollationKey text.CollationParsedRuleBuilder text.CollationRuleParser text.Collator text.CollatorReader text.DateFormat text.DateFormatSymbols text.DecimalFormat text.DecimalFormatSymbols text.Normalizer text.NumberFormat text.Replaceable text.ReplaceableString text.RuleBasedCollator text.RuleBasedNumberFormat text.SimpleDateFormat text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeFilter text.UnicodeMatcher text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.Calendar util.CalendarAstronomer util.CalendarCache util.CalendarFactory util.ChineseCalendar util.Currency util.GregorianCalendar util.RangeValueIterator util.SimpleTimeZone util.SimpleTimeZoneAdapter util.TimeZone util.TimeZoneData util.ValueIterator util.VersionInfo |
1,788 KB |
Transforms | transliterator, transliteratorTests | com.ibm.icu.dev.test.translit |
lang.UCharacter
lang.UCharacterCategory lang.UCharacterDirection lang.UCharacterNameIterator lang.UCharacterTypeIterator lang.UProperty lang.UScript lang.UScriptRun text.AnyTransliterator text.BreakDictionary text.BreakIterator text.BreakIteratorFactory text.BreakTransliterator text.DictionaryBasedBreakIterator text.Normalizer text.Replaceable text.ReplaceableString text.RuleBasedBreakIterator text.StringReplacer text.Transliterator text.UCharacterIterator text.UForwardCharacterIterator text.UnicodeSet text.UnicodeSetIterator text.UTF16 util.RangeValueIterator util.ValueIterator util.VersionInfo |
1,364 KB |
ant normalizer
ant normalizerTests
java -classpath $icu4j_root/classes com.ibm.icu.dev.test.TestAll -nothrow -w
ant moduleJar
ant normalizer collator
ant normalizerTests collatorTests
java -classpath $icu4j_root/classes com.ibm.icu.dev.test.TestAll -nothrow -w
ant moduleJar
ant -projecthelp
java -classpath $icu4j_root/icu4j.jar com.ibm.icu.dev.test.TestAll -nothrow -w
Note: the demos provided with ICU4J are for the most part undocumented. This list can show you where to look, but you'll have to experiment a bit. The demos (with the exception of richedit) are unsupported and may change or disappear without notice.
The icu4j.jar file contains only the core ICU4J classes, not the demo classes, so unless you build ICU4J there is little to try out.
java -jar $Root/richedit/richedit.jarThis will present an empty edit pane with an awt interface.
With a fuller command line you can try out other options, for example:
java -classpath $Root/richedit/richedit.jar com.ibm.richtext.demo.EditDemo [-swing][file]
This will use an awt GUI, or a swing GUI if
-swing is passed on the command line. It will open a text
file if one is provided, otherwise it will open a blank page. Click
to type.
You can add tabs to the tab ruler by clicking in the ruler while holding down the control key. Clicking on an existing tab changes between left, right, center, and decimal tabs. Dragging a tab moves it, dragging it off the ruler removes it.
You can experiment with complex text by using the keymap functions. Please note that these are mainly for demo purposes, for real work with Arabic or Hebrew you will want to use an input method. You will need to use a font that supports Arabic or Hebrew, 'Lucida Sans' (provided with Java) supports these languages.
The other demo programs are not supported and exist only to let you experiment with the ICU4J classes. First, build ICU4J using ant all. Then try one of the following:
By default the ICU4J distribution includes all of the new resource information. It is located in the package com.ibm.icu.impl.data, as a set of class files named "LocaleElements" followed by the names of locales in the form _xx_YY_ZZZZ, where 'xx' is the two-letter language code, 'YY' is the country code, and 'ZZ' (which can be any length) is a variant. Many of these fields can be omitted. Locale naming is documented the Locale class, java.util.Locale, and the use of these names in searching for resources is documented in java.util.ResourceBundle.
Some of these files require separate binary data. The names of the binary data files start with "CollationElements", then the corresponding Locale string, and end with '.res'. Another data file (only one at the moment) starts with the name "BreakDictionaryData", the corresponding Locale string, and ends with '.ucs'.
Some of the LocaleElements files share data with other LocaleElements files, because some Locale names have changed. For example, he_IL used to be iw_IL. In order to support both names but not duplicate the data, one of the class files refers to the other class file's data.
The list of supported resources is found in a file called LocaleElements_index.class. This contains the names of all the LocaleElements resources and is the source of the information returned by API such as Calendar.getAvailableLocales. (Note: for ease of customization this probably should be a text file).
LocaleElements files form a hierarchy, with up to four levels: a root, language, region (country), and variant. Searches for locale data attempt to match as far down the hierarchy as possible, for example, 'he_IL' will match LocaleElements_he_IL, but 'he_US' will match LocaleElements_he (since there is no 'US' variant for 'he', and 'xx_YY' will match LocaleElements (since there is no 'xx' language code in the LocaleElements hierarchy). Again, see java.util.ResourceBundle for more information.
With this in mind, the way to remove LocaleData is to make sure to remove all dependencies on that data as well. For example, if you remove LocaleElements_he.class, you need to remove LocaleElements_he_IL.class, since it is lower in the hierarchy, and you must remove LocaleElements_iw.class, since it references LocaleElements_he, and LocaleELements_iw_IL.class, since it depends on it (and also references LocaleElements_he_IL). For another example, if you remove CollationElements_zh__PINYIN.res, you must also remove LocaleElements_zh__PINYIN.class, since it depends on the CollationElements_zh__PINYIN.res.
Unfortunately, the jar tool in the JDK provides no way to remove items from a jar file. Thus you have to extract the resources, remove the ones you don't want, and then create a new jar file with the remining resources. See the jar tool information for how to do this. Before 'rejaring' the files, be sure to thoroughly test your application with the remaining resources, making sure each required resource is present.
ICU4J 2.1 and above uses the standard class lookup mechanism. This means any ppropriately named resource on the CLASSPATH will be located, in the order listed in the classpath.
If you create a resource file com.ibm.icu.impl.data.LocaleElements_xx_YY.class, and list it on the CLASSPATH before icu4j.jar, your resource will be used in place of any existing LocaleElements_xx_YY resource in icu4j. This is a good way to try out changes to resources. You can, for example, include the resource in your application's jar file and list it ahead of icu4j.jar.
In order to create new resources, you first must thoroughly understand the various elements contained in the resource files, their syntax and dependencies. You cannot simply 'patch' existing resource files with a single change because the new file completely replaces the old file in the resource hierarchy. In general, the new resource file should contain all the different data that the old one did, plus your changes.
Adding a new 'leaf' resource is easiest. Elements defined in that resource will override corresponding ones in the resources further up the hierarchy. Thus you can, for example, try out new localized names of days of the week, as they are all contained in one element. The variant mechanism can be used to temporarily try out new versions of existing resource elements (though we don't recommend shipping this way). Note though that some resources have detailed dependencies on each other, so that you cannot simply assume that a new element with the same structure and number of contents will 'just work.'
Patching an 'internal' resource (say, one corresponding to an existing language resource that has children) requires careful analysis of the contents of the resources.
LocaleElements resource data in ICU4J is checked in to the repository as precompiled class files. This means that inspecting the contents of these resources is difficult. They are compiled from java files that in turn are machine-generated from ICU4C binary data, using the genrb tool in ICU4C. You can view the contents of the ICU4C text resource files to understand the contents of the ICU4J resources, as they are the same.
Currently only the LocaleElements resource data is shared, other ICU resources (calendar, transliterator, etc.) are still checked in directly to ICU4J as source files. This means that development and maintenance of these resources continues as before, only LocaleElements resource data has been changed in ICU4J 2.1. This probably will change in the future once we work out a reasonable mechanism for storing and generating the resource data.
One goal of using the same resource data as ICU4C is to avoid keeping redundant copies of the resource data. Currently there is no separate repository of the 'master' resource data, it is checked in to ICU4C, and the tools for converting it to .java files are ICU4C tools. This is inconvenient for working in Java, but since maintenance of ICU4J and ICU4C is supposed to go on 'in parallel,' as a practical matter people will have to be familiar with development in both C and Java, and with the conventions and structure of each project. Additionally, sharing of data means that modifications to data immediately impact both projects (as it should) and thus both projects need to be tested when such changes are made. The bulk of the tools are currently on the ICU4C side, and will likely stay that way, so this seems like a reasonable initial approach to sharing the data.
While prototyping of LocaleElements data can occur in either Java or C, the final version should be checked in to ICU4C in text format. Genrb is then run to generate the .java and .res files. They are then compiled and jar'd into the file ICULocaleData.jar. The resulting jar file is then checked in to ICU4J as src/com/ibm/icu/dev/data/ICULocaleData.jar. (This is not great but it allows ICU4J to be downloaded and built as one project, instead of two, one for locale data and one for ICU4J proper. Given the 2.4 schedule it wasn't possible to work out the larger data sharing problem in time, so we tried to limit the impact to just what was needed to get JDK 1.4 support up and running.)
The files in ICULocaleData.jar get extracted to com/ibm/icu/impl/data in the build directory when the 'core' target is built. Thereafter, as long as the file LocaleElements_index.class file is untouched, they will not be extracted again. Building the 'resource' target will force the resources to once again be extracted. Extraction will overwrite any corresponding .class files already in that directory.
For information specific to this current release, please refer to the releasenotes.html
http://oss.software.ibm.com/icu4j/ is a pointer to general information about the International Components for Unicode in Java
http://www.ibm.com/developer/unicode is a pointer to information on how to make applications global.
Your comments are important to making ICU4J successful. We are committed to fixing any bugs, and will use your feedback to help plan future releases.
To submit comments, request features and report bugs, contact us through the ICU4J mailing list.
While we are not able to respond individually to each comment, we do review all comments.
Thanks for your interest in ICU4J!
Copyright © 2002-2003 International Business Machines Corporation and others. All Rights
Reserved.
5600 Cottle Road, San José, CA 95193