scuffed-code/icu4j/releasenotes.html

<html>
<head>
<title>ICU4J 2.2 Release Notes</title>
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<!--
*******************************************************************************
* Copyright (C) 2001-2002, International Business Machines Corporation and    *
* others. All Rights Reserved.                                                *
*******************************************************************************
*
* $Source: /xsrl/Nsvn/icu/icu4j/Attic/releasenotes.html,v $
* $Date: 2002/08/15 06:09:05 $
* $Revision: 1.2 $
*
*******************************************************************************
-->
</head>

<body bgcolor="#FFFFFF">
<!--#include virtual="/icu/ssi/header.html" -->
    <h2 align="center">International Components for Unicode for Java</h2>
    <h3 align="center">Release Notes for ICU4J 2.2</h3>
    <hr>
    <P>
      <B>This Release - Version 2.2</B><BR><br>
      This release note corresponds to ICU4J version 2.2, released on or around 15 Aug 2002.  For the current
      release, see the <a href="http://oss.software.ibm.com/icu4j/download/index.html">
      ICU4J download site</a>.
    </P>
    <P><B>License</B></P>
    <P>
    Please read and understand the <a href="./license.html">license</a>
    included with this release before installing and using the ICU4J libraries.
    </P>

    <p><B>What's new in Release 2.2</B></p>

<p>(Please also note the <b>packaging</b> and <b>data changes</b> introduced with ICU4J 2.1,
<a href="#2.1changes">described below</a>.)</p>
    <ul>
      <li><a href="#collation">Collation Enhancements</a></li>
      <ul>
        <li>Faster</li>
	<li>Smaller CollationKeys</li>
	<li>More options, for example, case ordering</li>
        <li>Supports Unicode 3.2</li>
      </ul>
      <li>Normalizer Enhancements</li>
      <ul>
	<li>Faster</li>
        <li>Supports Unicode 3.2</li>
      </ul>
      <li>Transform (nee Transliterator) enhancements</li>
      <ul>
        <li>Added getSourceSet() and getTargetSet()</li>
        <li>Added Any-X transliterator, with 16 new system transliterators
      </ul>
      <li>UnicodeSet enhancements</li>
      <ul>
        <li>Unicode 3.2 binary properties</li>
        <li>Full support for multicharacter strings</li>
        <li>Performance improvements</li>
      </ul>
      <li>Added Replaceable.hasMetaData()</li>
      <li>Improved foreign currency support (similar to JDK 1.4's)</li>
    </ul>

<p><b>Enhancements from Release 2.1 (partial list)</b></p>
    <ul>
    <li><a href="#repackage">Package restructuring</a></li>
    <li><a name="2.1changes">Changes for JDK 1.4</a>
    <ul>
        <li><a href="#ResourceData">ICU Resource Data added to ICU4J</a></li>
	<li>ICU4J size increase because of new resource data, clients can trim
    </ul>
    </ul>

     <P><B>Platform Dependencies</B> </P>
      <P>
      Parts of ICU4J depend on functionality that is only available in JDK 1.3
      or later, although some components work under earlier JVMs. All
      components should be compiled using a Java2 compiler, as even components
      that run under earlier JVMs can require language features that are only
      present in Java2. 1.1.x and 1.2.x JVMs are unsupported and untested. Specific to this release, there are 2 known non-reference platform compilers that fails compiling ICU4J source code, IBM JDK 1.3.1_02 and Jikes 1.16. Use ICU4J on the unsupported JVMs and non-reference platform compilers at your own risk.</P>
      <p>
      The reference platforms which we support and test ICU4J on are:<ul>
<li> Win2000, IBM JDK 1.3
<li> Solaris 2.7, Sun JDK 1.3.1
<li> AIX 5.1, IBM JDK 1.3
</ul>
      </p>

      <P><B>Installation Dependencies</B> </P>
      <UL>
        <LI>To install ICU4J as it is, simply place the prebuilt jar file
        <b>icu4j.jar</b> on your Java CLASSPATH. No other files are
        needed.
        <LI>If building ICU4J is required, you can use the <b>Ant build system</b>.<br>
          The <A href="http://jakarta.apache.org/downloads/binindex.html">Ant</A> build
          system is part of the Apache Software Foundation's Jakarta project. Ant
          is a portable, Java-based build system similar to make. ICU4J uses Ant
          because it introduces no other dependencies, it's portable, and it's
          easier to manage than a collection of makefiles. We currently build
          ICU4J using a single build.xml file on both Windows and Solaris using Ant.
          Installing Ant is straightforward.<BR>Note : It's recommended to
          install both the JDK and Ant somewhere outside the ICU4J directory, to
          keep them out of CVS's hair.<BR>For more information, read the <A
          href="http://jakarta.apache.org/downloads/binindex.html">Ant</A>
          documentation and the <A
          href="http://oss.software.ibm.com/developerworks/opensource/cvs/icu4j/~checkout~/icu4j/build.xml">build.xml</A>
          file.
      </UL>

      <p>
      For further detailed information about the ICU4J library, please refer to the
      <A href="http://oss.software.ibm.com/developerworks/opensource/cvs/icu4j/~checkout~/icu4j/readme.html?only_with_tag=release-2-2">
      ReadMe.</A>
      </p>

<hr size="2" width="100%" align="center">

    <p><h3><a name="collation">Collation Enhancements</a></h3></p>

<p>ICU4J's collation has been upgraded and now differs significantly from the JDK's implementation (originally provided by us several years ago).</p>

<p>ICU's collation is, in general, much more efficient than the JDK's. (The time to generate sort keys is longer, because they are so much shorter and more efficient to process).  For instance:
<ul>
  <li>ICU4J can correctly process FCD format strings with normalization off.  The JDK has no notion of FCD. Much user text is in FCD form (for more information about FCD, see <a href="http://www.unicode.org/notes/tn5/">Unicode Technical Note #5</a>).</li>
  <li>CollationKeys generated by ICU4J are compressed, and as compared with the JDK's, can be up to 70% smaller (e.g. in the case of Latin characters).</li>
  <li>String comparison in ICU is faster than the JDK's. In our tests of Latin characters, it took just 35% of the JDK's time. </li>
</ul></p>

<p>Although ICU4J's collation API is very compatable with the JDK's, there are some differences.  Here is a listing of the main ones:
<ul>
  <li>ICU4J supports <b>quaternary</b> and <b>identical</b> strength, the JDK does not.</li>
  <li> ICU4J supports extra collation options, the JDK does not:</li>
    <ul>
      <li>alternate handling</li>
      <li>case level sort</li>
      <li>upper case first or lower case first switch</li>
    </ul>
  <li>ICU4J supports Unicode 3.2, while the JDK (as of version 1.4) only supports Unicode 3.0.</li>
  <li>ICU4J does not allow turning off Thai reordering, while the JDK does.  This is because in Unicode 3.2 Thai reordering is always required. The JDK uses '!' in the rules to turn off Thai reordering; ICU4J ignores it.</li>
  <li>ICU4J supports additional rule syntax for various options, for example, setting <b>variable-top</b>, code point collation element positioning, and others.  For details, see the <a href="http://oss.software.ibm.com/icu/userguide/Collate_Customization.html">user's guide</a>.
  <li>ICU4J's version of CollationKey has a public constructor, so subclasses of RuleBasedCollator can create their own CollationKeys.  This was overlooked in the JDK (mea culpa).</li>
  <li>ICU4J does not support FULL_DECOMPOSITION, while the JDK does.</li>
  <li>ICU4J uses its own resource bundles, so the sorting order can differ from the JDK's.</li>
  <li>The CollationKeys generated by ICU4J and the JDK are different, so they cannot be compared.</li>
</ul>
</p>

    <h3><a name="repackage">Package Restructuring</a></h3>

<p>Starting with enhancement release 2.1 of ICU4J, the cvs repository
and package organization has changed.  This helps us to more cleanly
organize the classes, and to clarify relationships and differences
between parts of the project.</p>

<p>The new high-level structure is as follows:<br><tt><pre>
com
   .ibm
       .richtext       ---  root of rich edit control
       .icu            ---  root of icu
            .dev       ---  classes excluded from icu4j.jar (development only)
                .data  ---  data (e.g. unicode data files)
                .demo  ---  demos (e.g. calendar, holiday, translit)
                .test  ---  api tests grouped by functionality
                .tool  ---  tools used in development
            .impl      ---  root of 'internal' classes
                .data  ---  shipped data (text and resources)
            .lang      ---  similar to java.lang
            .math      ---  similar to java.math
            .text      ---  similar to java.text
            .util      ---  similar to java.util
</pre></tt></p>

<p>By and large class names didn't change, only packaging, so changing
the packages in your source should be sufficient to resolve most problems.
The package change <b>will break serialization</b> for those classes that are
serializable.</p>

<p>The classes in com.ibm.icu.impl are <b>internal use only</b>.
Their javadocs are not generated, their APIs are not supported, and
they can change APIs or disappear entirely <b>at any
time</b>.  Many classes in this package are public in order to
facilitate use by classes in multiple other packages, but this should not
be construed to mean that such classes will necessarily be 'promoted'
to full public classes in the future. Clients are warned not to depend
on anything in this package.</p>

    <h3><a name="ResourceData">ICU Resource Data added to ICU4J</a></h3>
<p>Starting with JDK 1.4, the resource information that used to be
available through public classes in java.text.resources is no longer
available.  Sun has moved these classes to an internal package.  This
has two consequences.  One, both the format and contents of the
resources can now change at any time-- dot releases and special bugfix
releases can be different.  Two, the resources are now no longer
accessible without explicit permission by the java user.
</p>
<p>
For these reasons, releases ICU4J 2.1 and above now includes its own
resource information
which is completely independent of the JDK resource information.  The
new ICU4J information is equivalent to the information in ICU4C and
ultimately derives from the same source.  This allows ICU4J 2.1 and above
to be
built on, and run on, JDK 1.4.
</p>
<p>
There are two main consequences of this decision.  The first is an
increase in size of ICU4J.  The new resource information, currently
stored as class files residing in a jar file, is approximately 1.15
megabytes.  The second is an increased difference between ICU's
resource information and Java's.  Neither is a clear superset of the
other.  For example, Java core currently has more timezone information
than ICU.  ICU's model for handling currency is also different than
Java's.  This will change over time as new versions of Java and ICU
are released.
</p>
<p>
In addition to the resource information that corresponds to the Java
resource information, ICU4J also includes resource information needed
to support its additional features, such as Transliteration, Calendar,
and DictionaryBasedBreakIterator.  This information has existed in
some form in prior releases on ICU4J and has not greatly changed in
size.
</p>
<h3>How to Remove Unneeded Resource Information</h3>
<p>
This section will focus on resource bundles included since ICU4J
release 2.1.
</p>
<p>
By default the ICU4J distribution includes all of the new resource
information.  It is located in the package com.ibm.icu.impl.data, as a
set of class files named "LocaleElements" followed by the names of
locales in the form _xx_YY_ZZZZ, where 'xx' is the two-letter language
code, 'YY' is the country code, and 'ZZ' (which can be any length) is
a variant.  Many of these fields can be omitted.  Locale naming is
documented the Locale class, java.util.Locale, and the use of these
names in searching for resources is documented in
java.util.ResourceBundle.
</p>
<p>
Some of these files require separate binary data.  The names of the
binary data files start with "CollationElements", then the
corresponding Locale string, and end with '.res'.  Another data file
(only one at the moment) starts with the name "BreakDictionaryData",
the corresponding Locale string, and ends with '.ucs'.
</p>
<p>
Some of the LocaleElements files share data with other LocaleElements
files, because some Locale names have changed. For example, he_IL used
to be iw_IL.  In order to support both names but not duplicate the
data, one of the class files refers to the other class file's data.
</p>
<p>
The list of supported resources is found in a file called
LocaleElements_index.class.  This contains the names of all the
LocaleElements resources and is the source of the information returned
by API such as Calendar.getAvailableLocales. (Note: for ease of
customization this probably should be a text file).
</p>
<p>
LocaleElements files form a hierarchy, with up to four levels: a root,
language, region (country), and variant.  Searches for locale data
attempt to match as far down the hierarchy as possible, for example,
'he_IL' will match LocaleElements_he_IL, but 'he_US' will match
LocaleElements_he (since there is no 'US' variant for 'he', and
'xx_YY' will match LocaleElements (since there is no 'xx' language
code in the LocaleElements hierarchy).  Again, see
java.util.ResourceBundle for more information.
</p>
<p>
With this in mind, the way to remove LocaleData is to make sure to
remove all dependencies on that data as well.  For example, if you
remove LocaleElements_he.class, you need to remove
LocaleElements_he_IL.class, since it is lower in the hierarchy, and
you must remove LocaleElements_iw.class, since it references
LocaleElements_he, and LocaleELements_iw_IL.class, since it depends on
it (and also references LocaleElements_he_IL).  For another example,
if you remove CollationElements_zh__PINYIN.res, you must also remove
LocaleElements_zh__PINYIN.class, since it depends on the
CollationElements_zh__PINYIN.res.
</p>

<p>
Unfortunately, the jar tool in the JDK provides no way to remove items
from a jar file.  Thus you have to extract the resources, remove the
ones you don't want, and then create a new jar file with the remining
resources.  See the jar tool information for how to do this.  Before
'rejaring' the files, be sure to thoroughly test your application with
the remaining resources, making sure each required resource is
present.
</p>

<h3>Developing Resources to be used with ICU4J</h3>
<p>
ICU4J 2.1 and above uses the standard class lookup mechanism.  This means
any ppropriately named resource on the CLASSPATH will be located, in the
order listed in the classpath.
</p>
<p>
If you create a resource file
com.ibm.icu.impl.data.LocaleElements_xx_YY.class, and list it on the
CLASSPATH before icu4j.jar, your resource will be used in place of any
existing LocaleElements_xx_YY resource in icu4j.  This is a good way
to try out changes to resources.  You can, for example, include the
resource in your application's jar file and list it ahead of
icu4j.jar.
</p>
<p>
In order to create new resources, you first must thoroughly understand
the various elements contained in the resource files, their syntax and
dependencies.  You cannot simply 'patch' existing resource files with
a single change because the new file completely replaces the old file
in the resource hierarchy.  In general, the new resource file should
contain all the different data that the old one did, plus your
changes.
</p>
<p>
Adding a new 'leaf' resource is easiest.  Elements defined in that
resource will override corresponding ones in the resources further up
the hierarchy.  Thus you can, for example, try out new localized names
of days of the week, as they are all contained in one element.  The
variant mechanism can be used to temporarily try out new versions of
existing resource elements (though we don't recommend shipping this
way).  Note though that some resources have detailed dependencies on
each other, so that you cannot simply assume that a new element with
the same structure and number of contents will 'just work.'
</p>
<p>
Patching an 'internal' resource (say, one corresponding to an existing
language resource that has children) requires careful analysis of the
contents of the resources.
</p>

<p>
LocaleElements resource data in ICU4J is checked in to the
repository as precompiled class files.  This means that inspecting the
contents of these resources is difficult.  They are compiled from java
files that in turn are machine-generated from ICU4C binary data, using
the genrb tool in ICU4C.  You can view the contents of the ICU4C text
resource files to understand the contents of the ICU4J resources, as
they are the same.
</p>

<h3>Developing ICU4J Resources</h3>
<p>
Currently only the LocaleElements resource data is shared, other ICU
resources (calendar, transliterator, etc.) are still checked in
directly to ICU4J as source files.  This means that development and
maintenance of these resources continues as before, only
LocaleElements resource data has been changed in ICU4J 2.1.  This
probably will change in the future once we work out a reasonable
mechanism for storing and generating the resource data.
</p>
<p>
One goal of using the same resource data as ICU4C is to avoid keeping
redundant copies of the resource data.  Currently there is no separate
repository of the 'master' resource data, it is checked in to ICU4C,
and the tools for converting it to .java files are ICU4C tools.  This
is inconvenient for working in Java, but since maintenance of ICU4J
and ICU4C is supposed to go on 'in parallel,' as a practical matter
people will have to be familiar with development in both C and Java,
and with the conventions and structure of each project.  Additionally,
sharing of data means that modifications to data immediately impact
both projects (as it should) and thus both projects need to be tested
when such changes are made.  The bulk of the tools are currently on
the ICU4C side, and will likely stay that way, so this seems like a
reasonable initial approach to sharing the data.
</p>

<p>
While prototyping of LocaleElements data can occur in either Java or
C, the final version should be checked in to ICU4C in text format.
Genrb is then run to generate the .java and .res files.  They are then
compiled and jar'd into the file ICULocaleData.jar.  The resulting jar
file is then checked in to ICU4J as
src/com/ibm/icu/dev/data/ICULocaleData.jar.  (This is not great but it
allows ICU4J to be downloaded and built as one project, instead of
two, one for locale data and one for ICU4J proper.  Given the 2.2
schedule it wasn't possible to work out the larger data sharing
problem in time, so we tried to limit the impact to just what was
needed to get JDK 1.4 support up and running.)
</p>

<p>
The files in ICULocaleData.jar get extracted to com/ibm/icu/impl/data in
the build directory when the 'core' target is built.  Thereafter, as
long as the file LocaleElements_index.class file is untouched, they will
not be extracted again.  Building the 'resource' target will force the
resources to once again be extracted.  Extraction will
overwrite any corresponding .class files already in that directory.
</p>
<hr size="2" width="100%" align="center">
<p><i><font size="-1">Copyright (C) 2002 International Business Machines Corporation and others.  All Rights Reserved.</font></i></p>
<!--#include virtual="/icu/ssi/footer.html" -->
</body>
</html>