International Components for Unicode for Java (ICU4J)

Read Me for ICU4J 3.8


Release Date
September 14, 2007

Note: This is major release of ICU4J. It contains bug fixes and adds implementations of inherited API and introduces new API or functionality.

For the most recent release, see the ICU4J download site.

Contents

Introduction to ICU4J

The International Components for Unicode (ICU) library provides robust and full-featured Unicode services on a wide variety of platforms. ICU supports the most current version of the Unicode standard, including support for supplementary characters (needed for GB 18030 repertoire support).

Java provides a strong foundation for global programs, and IBM and the ICU team played a key role in providing globalization technology to Java. But because of its long release schedule, Java cannot always keep up with evolving standards. The ICU team continues to extend Java's Unicode and internationalization support, focusing on improving performance, keeping current with the Unicode standard, and providing richer APIs, while remaining as compatible as possible with the original Java text and internationalization API design.

ICU4J is an add-on to the regular JRE that provides:

Note: We continue to provide assistance to Sun, and in some cases, ICU4J support has been rolled into a later release of Java. For example, the Thai word-break is now in Java 1.4. However, the most current and complete version is always found in ICU4J.

What Is New In This Release?

Changes to J2SE version requirement for building ICU4J

Previous version of ICU4J was successfully built with J2SE SDK 1.4 or later versions. In this release, there is a new feature which uses a new type introduced in J2SE 5.0. Although the new feature does not need the type available at run time, it requires J2SE 5.0 or newer version of Java class library at the build time to enable the feature. The binary distribution version of ICU4J available at the ICU download page was built with J2SE SDK 5.0 and this version should work well on JRE 1.4 or later versions. If you want to build your own copy of ICU4J binaries with J2SE SDK 1.4, you can still run all of the Ant standard build targets in build.xml and make ICU4J binaries. The build script detects the current Java version and comments out code blocks including references to J2SE 5.0 only types.

Changes to timezone formatting and parsing

In ICU 3.8, the behavior of date formatting and parsing has changed significantly, perhaps requiring recoding on your part depending on your usage. For more information, see Formatting Dates and Times in the User Guide.

Status of ICU4J charset converter

The ICU4J implementation of java.nio.charset.Charset is included as a Technology Preview. Not all functionality from the java.nio.Charset interfaces is operational, and some converters are known to mis-handle Unicode supplementary characters. Use with caution.

New features

See the ICU 3.8 download page about new features in this release.

License Information

The ICU projects (ICU4C and ICU4J) use the X license. The X license is suitable for commercial use and is a recommended free software license that is compatible with the GNU GPL license. This became effective with release 1.8.1 of ICU4C and release 1.3.1 of ICU4J in mid-2001. All new ICU releases will adopt the X license; previous ICU releases continue to utilize the IPL (IBM Public License). Users of previous releases of ICU who want to adopt new ICU releases will need to accept the terms and conditions of the X license.

The main effect of the change is to provide GPL compatibility. The X license is listed as GPL compatible, see the GNU page at http://www.gnu.org/philosophy/license-list.html#GPLCompatibleLicenses. This means that GPL projects can now use ICU code, it does not mean that projects using ICU become subject to GPL.

The IBM version contains the essential text of the license, omitting the X-specific trademarks and copyright notices. The full copy of ICU's license is included in the download package.

Platform Dependencies

By default ICU4J depends on functionality that is only available in J2SE 1.4 or later releases. Some new ICU4J features support types introduced in J2SE 5, you can still use the same ICU4J binaries on JRE 1.4. We provide the ability to build a variant of ICU4J that will run on JRE 1.3, but not all build targets work on that platform. Currently 1.1.x and 1.2.x JREs are unsupported and untested, and you use the components on these JREs at your own risk.

The table below shows operating systems and JRE/JDK versions currently used by the ICU development team.

Operating System Sun Java SE IBM Java SE
1.6.0 1.5.0 1.4.2 1.4.1 1.4.0 1.5.0 1.4.2 1.4.1
AIX 5.2 - - - - - Regularly tested Regularly tested Rarely tested
AIX 5.3 - - - - - Reference platform Regularly tested Rarely tested
HP-UX 11 (PA-RISC) - Regularly tested Regularly tested - - - - -
HP-UX 11 (IA64) - Regularly tested Regularly tested - - - - -
Redhat Enterprise Linux 4 (x86) Regularly tested Regularly tested Regularly tested Rarely tested Rarely tested Regularly tested Regularly tested -
Redhat Enterprise Linux 5 (x86) Regularly tested Regularly tested Regularly tested Rarely tested Rarely tested Regularly tested Regularly tested -
Solaris 9 (SPARC) Regularly tested Regularly tested Regularly tested Rarely tested Rarely tested - - -
Solaris 10 (SPARC) Regularly tested Reference platform Regularly tested Rarely tested Rarely tested - - -
Windows XP Regularly tested Regularly tested Regularly tested - - Reference platform Regularly tested Rarely tested
Windows Vista Regularly tested Regularly tested Regularly tested - - Regularly tested Regularly tested Rarely tested

How to Download ICU4J

There are two ways to download the ICU4J releases.

For more details on how to download ICU4J directly from the web site, please see the ICU downloads page at http://www.icu-project.org/download/

The Structure and Contents of ICU4J

Below, $icu4j_root is the placement of the icu directory in your file system, like "drive:\...\icu4j" in your environment. "drive:\..." stands for any drive and any directory on that drive that you chose to install icu4j into.

Information and build files:

readme.html
(this file)
A description of ICU4J (International Components for Unicode for Java)
license.html The X license, used by ICU4J
build.xml Ant build file. See How to Install and Build for more information

The source directories mirror the package structure of the code.
Core packages become part of the ICU4J jar file.
Charset packages become part of the ICU4J charset jar file.
API packages contain classes with supported API.
RichText classes are Core and API, but can be removed from icu4j.jar, and can be built into their own jar.

$icu4j_root/src/com/ibm/icu/charset
Charset, API
Packages that provide Charset conversion
$icu4j_root/src/com/ibm/icu/dev
Non-Core, Non-API
Packages used for internal development:
  • Data: data used by tests and in building ICU
  • Demos: Calendar, Holiday, Break Iterator, Rule-based Number Format, Transformations
    (See below for more information about the demos.)
  • Tests: API and coverage tests of all functionality.
    For information about running the tests, see $icu4j_root/src/com/ibm/icu/dev/test/TestAll.java.
  • Tools: tools used to build data tables, etc.
$icu4j_root/src/com/ibm/icu/impl
Core, Non-API
These are utility classes used from different ICU4J core packages.
$icu4j_root/src/com/ibm/icu/lang
Core, API
Character properties package.
$icu4j_root/src/com/ibm/icu/math
Core, API
Additional math classes.
$icu4j_root/src/com/ibm/icu/text
Core, API
Additional text classes. These add to, and in some cases replace, related core Java classes:
  • Arabic shaping
  • Break iteration
  • Date formatting
  • Number formatting
  • Transliteration
  • Normalization
  • String manipulation
  • Collation
  • String search
  • Unicode compression
  • Unicode sets
$icu4j_root/src/com/ibm/icu/util
Core, API
Additional utility classes:
  • Calendars - Gregorian, Buddhist, Coptic, Ethiopic, Hebrew, Islamic, Japanese, Chinese and others
  • Holiday
  • TimeZone
  • VersionInfo
  • Iteration
$icu4j_root/src/com/ibm/richtext
RichText
Styled text editing package. This includes demos, tests, and GUIs for editing and displaying styled text. The richtext package provides a scrollable display, typing, arrow-key support, tabs, alignment and justification, word- and sentence-selection (by double-clicking and triple-clicking, respectively), text styles, clipboard operations (cut, copy and paste) and a log of changes for undo-redo. Richtext uses Java's TextLayout and complex text support (provided to Sun by the ICU4J team).

Building ICU4J creates and populates the following directories:

$icu4j_root/classes contains all class files
$icu4j_root/doc contains JavaDoc for all packages

ICU4J data is stored in the following locations:

com.ibm.icu.impl.data Holds data used by the ICU4J core packages (com.ibm.icu.lang, com.ibm.icu.text, com.ibm.icu.util, com.ibm.icu.math and com.ibm.icu.text). In particular, all resource information is stored here.
com.ibm.icu.dev.data Holds data that is not part of ICU4J core, but rather part of a test, sample, or demo.

Where to get Documentation

The ICU user's guide contains lots of general information about ICU, in its C, C++, and Java incarnations.

The complete API documentation for ICU4J (javadoc) is available on the ICU4J web site, and can be built from the sources:

How to Install and Build

To install ICU4J, simply place the prebuilt jar file icu4j.jar on your Java CLASSPATH. If you need Charset API support please place icu4j-charsets.jar on your class path. No other files are needed.

Eclipse users: See the ICU4J site for information on how to configure Eclipse to build ICU4J.

To build ICU4J, you will need a J2SE SDK and the Ant build system. We strongly recommend using the Ant build system to build ICU4J. It's recommended to install both the J2SE SDK and Ant somewhere outside the ICU4J directory. For example, on Linux you might install these in /usr/local.

Once the J2SE SDK and Ant are installed, building is just a matter of typing ant in the ICU4J root directory. This causes the Ant build system to perform a build as specified by the file build.xml, located in the ICU4J root directory. You can give Ant options like -verbose, and you can specify targets. Ant will only build what's been changed and will resolve dependencies properly. For example:

C:\icu4j>ant
Buildfile: build.xml

checkAntVersion:

warnAntVersion:

initBase:
    [mkdir] Created dir: C:\icu4j\classes
     [echo] java home: C:\jdk1.5.0
     [echo] java version: 1.5.0
     [echo] ant java version: 1.5
     [echo] Apache Ant version 1.7.0 compiled on December 13 2006
     [echo] ICU4JDEV with Windows XP 5.1 build 2600 Service Pack 2 on x86
     [echo] clover initstring = '${clover.initstring}'
     [echo] target runtime environment: J2SE15
     [echo] Initialized at 2007-08-30 at 04:14:09 EDT

buildMangle:
    [javac] Compiling 1 source file to C:\icu4j\classes

initSrc:

displayBuildEnvWarning:

doMangle:
     [echo] Running source code preprocessor for [J2SE15]

init:

coreData:
     [copy] Copying 1 file to C:\icu4j\classes\com\ibm\icu\impl\data

icudata:
    [unjar] Expanding: C:\icu4j\src\com\ibm\icu\impl\data\icudata.jar into C:\ic
u4j\classes
     [copy] Copying 1 file to C:\icu4j\classes\META-INF

durationdata:
     [copy] Copying 16 files to C:\icu4j\classes\com\ibm\icu\impl\duration\impl\
data

core:
    [javac] Compiling 317 source files to C:\icu4j\classes
    [javac] Note: * uses or overrides a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.

BUILD SUCCESSFUL
Total time: 10 seconds
Note: The above output is an example. The numbers are likely to be different with the current version ICU4J.

The following are some targets that you can provide to ant. For more targets run ant -projecthelp or see the build.xml file.

all Build all targets.
core Build the main class files in the subdirectory classes. If no target is specified, core is assumed.
tests Build the test class files.
demos Build the demos.
tools Build the tools.
docs Run javadoc over the main class files, generating an HTML documentation tree in the subdirectory doc.
jar Create a jar archive icu4j.jar in the root ICU4J directory containing the main class files.
jarSrc Like the jar target, but containing only the source files.
jarDocs Like the jar target, but containing only the docs.
richedit Build the richedit core class files and tests.
richeditJar Create the richedit jar file (which contains only the richedit core class files). The file richedit.jar will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten.
richeditZip Create a zip archive of the richedit docs and jar file for distribution. The zip file richedit.zip will be created in the ./richedit subdirectory. Any existing file of that name will be overwritten.
clean Remove all built targets, leaving the source.

For more information, read the Ant documentation and the build.xml file.

After doing a build it is a good idea to run all the icu4j tests by typing
"ant check" or "java -classpath classes com.ibm.icu.dev.test.TestAll -nothrow".

How to modularize ICU4J

Some clients may not wish to ship all of ICU4J with their application, since the application might only use a small part of ICU4J. ICU4J release 2.6 and later provide build options to build individual ICU4J 'modules' for a more compact distribution. The modules are based on a service and the APIs that define it, e.g., the normalizer module supports all the APIs of the Normalizer class (and some others). Tests can be run to verify that the APIs supported by the module function correctly. Because of internal code dependencies, a module contains extra classes that are not part of the module's core service API. Some or most of the APIs of these extra classes will not work. Only the module's core service API is guaranteed. Other APIs may work partially or not at all, so client code should avoid them.

Individual modules are not built directly into their own separate jar files. Since their dependencies often overlap, using separate modules to 'add on' ICU4J functionality would result in unwanted duplication of class files. Instead, building a module causes a subset of ICU4J's classes to be built and put into ICU4J's standard build directory. After one or more module targets are built, the 'moduleJar' target can then be built, which packages the class files into a 'module jar.' Other than the fact that it contains fewer class files, little distinguishes this jar file from a full ICU4J jar file, and in fact they share the same name.

Currently ICU4J can be divided into the following modules:

Key:

Module Name Ant Targets Test Package Supported Size‡
Package* Main Classes†
* com.ibm. should be prepended to the package names listed.
† Class name in bold indicates core service API. Only APIs in these classes are fully supported.
‡ Sizes are of the compressed jar file containing only this module. These sizes are approximate for release 3.6.

Modules:

Normalizer normalizer, normalizerTests com.ibm.icu.dev.test.normalizer 465 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript
icu.text: BreakIterator, CanonicalIterator, Normalizer, Replaceable, ReplaceableString, SymbolTable, UCharacterIterator, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: Freezable, RangeValueIterator, StringTokenizer, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
Collator collator, collatorTests com.ibm.icu.dev.test.collator 1,911 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript
icu.text: BreakDictionary, BreakIterator, CanonicalIterator, CollationElementIterator, CollationKey, Collator, DictionaryBasedBreakIterator, Normalizer, RawCollationKey, Replaceable, ReplaceableString, RuleBasedBreakIterator, RuleBasedCollator, SymbolTable, UCharacterIterator, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: ByteArrayWrapper, CompactByteArray, Freezable, RangeValueIterator, StringTokenizer, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
Calendar calendar, calendarTests com.ibm.icu.dev.test.calendar 2,176 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript
icu.math: BigDecimal, MathContext
icu.text: BreakIterator, CanonicalIterator, ChineseDateFormat, ChineseDateFormatSymbols, CollationElementIterator, CollationKey, Collator, DateFormat, DateFormatSymbols, DecimalFormat, DecimalFormatSymbols, MessageFormat, Normalizer, NumberFormat, PluralFormat, PluralRules, RawCollationKey, Replaceable, ReplaceableString, RuleBasedCollator, RuleBasedNumberFormat, RuleBasedTransliterator, SimpleDateFormat, SymbolTable, UCharacterIterator, UFormat, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: AnnualTimeZoneRule, BasicTimeZone, BuddhistCalendar, ByteArrayWrapper, Calendar, ChineseCalendar, CopticCalendar, Currency, CurrencyAmount, DateRule, DateTimeRule, EasterHoliday, EthiopicCalendar, Freezable, GregorianCalendar, HebrewCalendar, HebrewHoliday, Holiday, IndianCalendar, InitialTimeZoneRule, IslamicCalendar, JapaneseCalendar, Measure, MeasureUnit, RangeDateRule, RangeValueIterator, SimpleDateRule, SimpleHoliday, SimpleTimeZone, StringTokenizer, TaiwanCalendar, TimeZone, TimeZoneRule, TimeZoneTransition, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
BreakIterator breakIterator, breakIteratorTests com.ibm.icu.dev.test.breakiterator 1,889 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript
icu.text: BreakDictionary, BreakIterator, CanonicalIterator, DictionaryBasedBreakIterator, Normalizer, Replaceable, ReplaceableString, RuleBasedBreakIterator, SymbolTable, Transliterator, UCharacterIterator, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: CompactByteArray, Freezable, RangeValueIterator, StringTokenizer, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
Basic Properties propertiesBasic, propertiesBasicTests com.ibm.icu.dev.test.lang 554 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript, UScriptRun
icu.text: BreakDictionary, BreakIterator, DictionaryBasedBreakIterator, Normalizer, Replaceable, ReplaceableString, RuleBasedBreakIterator, SymbolTable, UCharacterIterator, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: CompactByteArray, Freezable, RangeValueIterator, StringTokenizer, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
Full Properties propertiesFull, propertiesFullTests com.ibm.icu.dev.test.lang 1,829 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript, UScriptRun
icu.text: BreakDictionary, BreakIterator, DictionaryBasedBreakIterator, Normalizer, Replaceable, ReplaceableString, RuleBasedBreakIterator, SymbolTable, UCharacterIterator, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: CompactByteArray, Freezable, RangeValueIterator, StringTokenizer, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
Formatting format, formatTests com.ibm.icu.dev.test.format 3,443 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript
icu.math: BigDecimal, MathContext
icu.text: BreakIterator, CanonicalIterator, ChineseDateFormat, ChineseDateFormatSymbols, CollationElementIterator, CollationKey, Collator, DateFormat, DateFormatSymbols, DecimalFormat, DecimalFormatSymbols, DurationFormat, MeasureFormat, MessageFormat, Normalizer, NumberFormat, PluralFormat, PluralRules, RawCollationKey, Replaceable, ReplaceableString, RuleBasedCollator, RuleBasedNumberFormat, SimpleDateFormat, SymbolTable, UCharacterIterator, UFormat, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: AnnualTimeZoneRule, BasicTimeZone, BuddhistCalendar, ByteArrayWrapper, Calendar, ChineseCalendar, CopticCalendar, Currency, CurrencyAmount, DateTimeRule, EthiopicCalendar, Freezable, GregorianCalendar, HebrewCalendar, IndianCalendar, InitialTimeZoneRule, IslamicCalendar, JapaneseCalendar, Measure, MeasureUnit, RangeValueIterator, SimpleTimeZone, StringTokenizer, TaiwanCalendar, TimeArrayTimeZoneRule, TimeZone, TimeZoneRule, TimeZoneTransition, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
StringPrep, IDNA stringPrep, stringPrepTests com.ibm.icu.dev.test.stringprep 488 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript
icu.text: StringPrep, StringParseException, SymbolTable, UCharacterIterator, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: Freezable, RangeValueIterator, StringTokenizer, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo
Transforms transliterator, transliteratorTests com.ibm.icu.dev.test.translit 890 KB
icu.lang: UCharacter, UCharacterCategory, UCharacterDirection, UCharacterEnums, UProperty, UScript
icu.text: BreakDictionary, BreakIterator, DictionaryBasedBreakIterator, Normalizer, Replaceable, ReplaceableString, RuleBasedBreakIterator, RuleBasedCollator, RuleBasedTransliterator, StringTransform, SymbolTable, Transliterator, UCharacterIterator, UForwardCharacterIterator, UnicodeFilter, UnicodeMatcher, UnicodeSet, UnicodeSetIterator, UTF16
icu.util: CaseInsensitiveString, CompactByteArray, Freezable, RangeValueIterator, StringTokenizer, ULocale, UResourceBundle, UResourceBundleIterator, UResourceTypeMismatchException, ValueIterator, VersionInfo

Building any of these modules is as easy as specifying a build target to the Ant build system, e.g:
To build a module that contains only the Normalizer API:

  1. Build the module.
    ant normalizer
  2. Build the jar containing the module.
    ant moduleJar
  3. Build the tests for the module.
    ant normalizerTests
  4. Run the tests and verify that the self tests pass.
    java -classpath classes com.ibm.icu.dev.test.TestAll -nothrow -w
If more than one module is required, the module build targets can be concatenated, e.g:
  1. Build the modules.
    ant normalizer collator
  2. Build the jar containing the modules.
    ant moduleJar
  3. Build the tests for the module.
    ant normalizerTests collatorTests
  4. Run the tests and verify that they pass.
    java -classpath classes com.ibm.icu.dev.test.TestAll -nothrow -w
The jar should be built before the tests, since for some targets building the tests will cause additional classes to be compiled that are not strictly necessary for the module itself.
Notes:

Trying Out ICU4J

Note: the demos provided with ICU4J are for the most part undocumented. This list can show you where to look, but you'll have to experiment a bit. The demos (with the exception of richedit) are unsupported and may change or disappear without notice.

The icu4j.jar file contains only the core ICU4J classes, not the demo classes, so unless you build ICU4J there is little to try out.

Charset

To try out the Charset package, build icu4j.jar and icu4j-charsets.jar using 'jar' target. You can use the charsets by placing these files on your classpath.
java -cp $icu4j_root/icu4j.jar:$icu4j_root/icu4j-charsets.jar <your program>

Rich Edit

To try out the richedit package, first build the richeditJar target. This is a 'runnable' jar file. To run the richedit demo, type:
java -jar $icu4j_root/richedit/richedit.jar
This will present an empty edit pane with an awt interface.

With a fuller command line you can try out other options, for example:

java -classpath $icu4j_root/richedit/richedit.jar com.ibm.richtext.demo.EditDemo [-swing][file]

This will use an awt GUI, or a swing GUI if -swing is passed on the command line. It will open a text file if one is provided, otherwise it will open a blank page. Click to type.

You can add tabs to the tab ruler by clicking in the ruler while holding down the control key. Clicking on an existing tab changes between left, right, center, and decimal tabs. Dragging a tab moves it, dragging it off the ruler removes it.

You can experiment with complex text by using the keymap functions. Please note that these are mainly for demo purposes, for real work with Arabic or Hebrew you will want to use an input method. You will need to use a font that supports Arabic or Hebrew, 'Lucida Sans' (provided with Java) supports these languages.

Other demos

The other demo programs are not supported and exist only to let you experiment with the ICU4J classes. First, build ICU4J using ant all. Then try one of the following:

ICU4J Resource Information

Starting with release 2.1, ICU4J includes its own resource information which is completely independent of the JRE resource information. (Note, ICU4J 2.8 to 3.4, time zone information depends on the underlying JRE). The new ICU4J information is equivalent to the information in ICU4C and many resources are, in fact, the same binary files that ICU4C uses.

By default the ICU4J distribution includes all of the standard resource information. It is located under the directory com/ibm/icu/impl/data. Depending on the service, the data is in different locations and in different formats. Note: This will continue to change from release to release, so clients should not depend on the exact organization of the data in ICU4J.

Some of the data files alias or otherwise reference data from other data files. One reason for this is because some locale names have changed. For example, he_IL used to be iw_IL. In order to support both names but not duplicate the data, one of the resource files refers to the other file's data. In other cases, a file may alias a portion of another file's data in order to save space. Currently ICU4J provides no tool for revealing these dependencies.

Note: Java's Locale class silently converts the language code "he" to "iw" when you construct the Locale (for versions of Java through Java 5). Thus Java cannot be used to locate resources that use the "he" language code. ICU, on the other hand, does not perform this conversion in ULocale, and instead uses aliasing in the locale data to represent the same set of data under different locale ids.

Resource files that use locale ids form a hierarchy, with up to four levels: a root, language, region (country), and variant. Searches for locale data attempt to match as far down the hierarchy as possible, for example, "he_IL" will match he_IL, but "he_US" will match he (since there is no US variant for he, and "xx_YY will match root (the default fallback locale) since there is no xx language code in the locale hierarchy. Again, see java.util.ResourceBundle for more information.

Currently ICU4J provides no tool for revealing these dependencies between data files, so trimming the data directly in the ICU4J project is a hit-or-miss affair. The key point when you remove data is to make sure to remove all dependencies on that data as well. For example, if you remove he.res, you need to remove he_IL.res, since it is lower in the hierarchy, and you must remove iw.res, since it references he.res, and iw_IL.res, since it depends on it (and also references he_IL.res).

Unfortunately, the jar tool in the JDK provides no way to remove items from a jar file. Thus you have to extract the resources, remove the ones you don't want, and then create a new jar file with the remining resources. See the jar tool information for how to do this. Before 'rejaring' the files, be sure to thoroughly test your application with the remaining resources, making sure each required resource is present.

Using additional resource files with ICU4J

Warning: Resource file formats can change across releases of ICU4J!
The format of ICU4J resources is not part of the API. Clients who develop their own resources for use with ICU4J should be prepared to regenerate them when they move to new releases of ICU4J.

We are still developing ICU4J's resource mechanism. Currently it is not possible to mix icu's new binary .res resources with traditional java-style .class or .txt resources. We might allow for this in a future release, but since the resource data and format is not formally supported, you run the risk of incompatibilities with future releases of ICU4J.

Resource data in ICU4J is checked in to the repository as a jar file containing the resource binaries, icudata.jar. This means that inspecting the contents of these resources is difficult. They currently are compiled from ICU4C .txt file data. You can view the contents of the ICU4C text resource files to understand the contents of the ICU4J resources.

The files in icudata.jar get extracted to com/ibm/icu/impl/data in the build directory when the 'core' target is built. Building the 'resources' target will force the resources to once again be extracted. Extraction will overwrite any corresponding resource files already in that directory.

Building ICU4J Resources from ICU4C

Requirements
Procedure
  1. Download and build ICU4C. For instructions on downloading and building ICU4C, please click here.
  2. Change directory to $icu4c_root/source/tools/genrb. $icu4c_root is the root directory of ICU4C source package.
  3. Run gendtjar.pl from that directory itself with the command
       ./gendtjar.pl --icu-root=$icu4c_root --jar=$jdk_home/bin --icu4j-root=$icu4j_root
    e.g.
       ./gendtjar.pl --icu-root=$HOME/icu4c --jar=/usr/local/bin/java/bin/ --icu4j-root=$HOME/icu4j
    Execution of gendtjar.pl script will create the required jar files in the $icu4c_root/source/tools/genrb/temp directory and then copy them to their final locations in the ICU4J structure, $icu4j_root/src/com/ibm/icu/impl/data and $icu4j_root/src/com/ibm/icu/dev/data.
  4. Build resources target of ant to unpack the jar files with the following commands:
       cd $icu4j_root
       ant resources
Note: if gendtjar.pl does not work, the --verbose option can help in debugging why it went wrong.
Generating Data from CLDR
Note: This procedure assumes that all 3 sources are in sibling directories
  1. Checkout CLDR. $cldr_root in the following steps is the root directory where the CLDR source files checked out.
  2. Update $cldr_root/common to 'release-1-5-0-1' tag
  3. Update $cldr_root/tools to 'release-1-5-0-1' tag
  4. Checkout ICU with tag 'release-3-8'
  5. Checkout ICU4J with tag 'release-3-8'
  6. Build ICU4J
  7. Build ICU4C
  8. Change to $cldr_root/tools/java directory
  9. Build CLDR using ant after pointing ICU4J_CLASSES env var to the newly build ICU4J
  10. cd to $icu4c_root/source/data directory
  11. Follow the instructions in the cldr-icu-readme.txt
  12. Build ICU data from CLDR
  13. cd to $icu4c_root/source/tools/genrb
  14. run gendtjar.pl as explained in the previous section.
  15. cd to $icu4j_root dir
  16. Build and test icu4j

Where to Find More Information

http://www.ibm.com/software/globalization/icu/ is a pointer to general information about the International Components for Unicode in Java

http://www.ibm.com/software/globalization/ is a pointer to information on how to make applications global.

Submitting Comments, Requesting Features and Reporting Bugs

Your comments are important to making ICU4J successful. We are committed to fixing any bugs, and will use your feedback to help plan future releases.

To submit comments, request features and report bugs, contact us through the ICU Support mailing list.
While we are not able to respond individually to each comment, we do review all comments.



Thank you for your interest in ICU4J!



Copyright © 2002-2007 International Business Machines Corporation and others. All Rights Reserved.
4400 North First Street, San José, CA 95193, USA