International Components for Unicode
ICU 49 ReadMe

Last updated: 2011-June-24
Copyright © 1997-2011 International Business Machines Corporation and others. All Rights Reserved.


Table of Contents


Introduction

Today's software market is a global one in which it is desirable to develop and maintain one application (single source/single binary) that supports a wide variety of languages. The International Components for Unicode (ICU) libraries provide robust and full-featured Unicode services on a wide variety of platforms to help this design goal. The ICU libraries provide support for:

ICU has a sister project ICU4J that extends the internationalization capabilities of Java to a level similar to ICU. The ICU C/C++ project is also called ICU4C when a distinction is necessary.

Getting started

This document describes how to build and install ICU on your machine. For other information about ICU please see the following table of links.
The ICU homepage also links to related information about writing internationalized software.

Here are some useful links regarding ICU and internationalization in general.
ICU, ICU4C & ICU4J Homepage http://icu-project.org/
FAQ - Frequently Asked Questions about ICU http://userguide.icu-project.org/icufaq
ICU User's Guide http://userguide.icu-project.org/
How To Use ICU http://userguide.icu-project.org/howtouseicu
Download ICU Releases http://site.icu-project.org/download
ICU4C API Documentation Online http://icu-project.org/apiref/icu4c/
Online ICU Demos http://demo.icu-project.org/icu-bin/icudemos
Contacts and Bug Reports/Feature Requests http://site.icu-project.org/contacts

Important: Please make sure you understand the Copyright and License Information.

What is new in this release?

To see which APIs are new or changed in this release, view the ICU4C API Change Report.

The following list concentrates on changes that affect existing applications migrating from previous ICU releases. For more news about this release, see the ICU download page.

MessageFormat Changes

MessageFormat and related classes (choice/plural/select) have been reimplemented, with several improvements and some incompatible changes. See the ICU 4.8 download page for details.

Unknown system time zone - Etc/Unknown

The behavior of the time zone factory method TimeZone::createTimeZone(const UnicodeString&) has changed in this release. When an unknown time zone ID is specified in the method, previous versions return a TimeZone instance with ID "GMT" (offset 0 and no daylight saving time). In ICU 4.8, the method uses "Etc/Unknown" as the time zone ID (but still offset 0 and no daylight saving time) for the case. Existing software checking the returned time zone ID to validate the input ID may need to be updated to support the new behavior.

C++ namespace support required

ICU4C 49 requires C++ namespace support. As a result, for example, rather than U_NAMESPACE_QUALIFIER UnicodeString you can now simply write icu::UnicodeString.

One shared platform.h

ICU4C 49 does not generate any source code files via autoconf any more. Instead, platform.h itself is now a normal source header file, and determines platform-specific settings via #if ... etc.

As a result, it is easier to cross-compile ICU4C and/or use different build systems. No more headers are #included from the build-output directory, and all platforms use the same set of source code files.

However, it is likely that ICU4C 49 will not compile on some platforms (non-POSIX and/or older/unusual compilers etc.) that the ICU team did not test. As a temporary workaround, any platform-dependent macro for which platform.h does not determine the correct value can be predefined via CPPFLAGS or by adding an explicit #define ... into platform.h before it first tests that macro.

Please submit a bug ticket per platform with details about your compiler, its version and its predefined macros. (For example, preprocessing an empty source file with gcc's -dM option outputs all of gcc's predefined macros: gcc -E -dM -x c /dev/null | sort) A patch to fix the problem would be welcome too!

How To Download the Source Code

There are two ways to download ICU releases:

ICU Source Code Organization

In the descriptions below, <ICU> is the full path name of the ICU directory (the top level directory from the distribution archives) in your file system. You can also view the ICU Architectural Design section of the User's Guide to see which libraries you need for your software product. You need at least the data ([lib]icudt) and the common ([lib]icuuc) libraries in order to use ICU.

The following files describe the code drop.
File Description
readme.html Describes the International Components for Unicode (this file)
license.html Contains the text of the ICU license


The following directories contain source code and data files.
Directory Description
<ICU>/source/common/ The core Unicode and support functionality, such as resource bundles, character properties, locales, codepage conversion, normalization, Unicode properties, Locale, and UnicodeString.
<ICU>/source/i18n/ Modules in i18n are generally the more data-driven, that is to say resource bundle driven, components. These deal with higher-level internationalization issues such as formatting, collation, text break analysis, and transliteration.
<ICU>/source/layout/ Contains the ICU layout engine (not a rasterizer).
<ICU>/source/io/ Contains the ICU I/O library.
<ICU>/source/data/

This directory contains the source data in text format, which is compiled into binary form during the ICU build process. It contains several subdirectories, in which the data files are grouped by function. Note that the build process must be run again after any changes are made to this directory.

If some of the following directories are missing, it's probably because you got an official download. If you need the data source files for customization, then please download the ICU source code from subversion.

  • in/ A directory that contains a pre-built data library for ICU. A standard source code package will contain this file without several of the following directories. This is to simplify the build process for the majority of users and to reduce platform porting issues.
  • brkitr/ Data files for character, word, sentence, title casing and line boundary analysis.
  • locales/ These .txt files contain ICU language and culture-specific localization data. Two special bundles are root, which is the fallback data and parent of other bundles, and index, which contains a list of installed bundles. The makefile resfiles.mk contains the list of resource bundle files.
  • mappings/ Here are the code page converter tables. These .ucm files contain mappings to and from Unicode. These are compiled into .cnv files. convrtrs.txt is the alias mapping table from various converter name formats to ICU internal format and vice versa. It produces cnvalias.icu. The makefiles ucmfiles.mk, ucmcore.mk, and ucmebcdic.mk contain the list of converters to be built.
  • translit/ This directory contains transliterator rules as resource bundles, a makefile trnsfiles.mk containing the list of installed system translitaration files, and as well the special bundle translit_index which lists the system transliterator aliases.
  • unidata/ This directory contains the Unicode data files. Please see http://www.unicode.org/ for more information.
  • misc/ The misc directory contains other data files which did not fit into the above categories. Currently it only contains time zone information, and a name preperation file for IDNA.
  • out/ This directory contains the assembled memory mapped files.
  • out/build/ This directory contains intermediate (compiled) files, such as .cnv, .res, etc.

If you are creating a special ICU build, you can set the ICU_DATA environment variable to the out/ or the out/build/ directories, but this is generally discouraged because most people set it incorrectly. You can view the ICU Data Management section of the ICU User's Guide for details.

<ICU>/source/test/intltest/ A test suite including all C++ APIs. For information about running the test suite, see the build instructions specific to your platform later in this document.
<ICU>/source/test/cintltst/ A test suite written in C, including all C APIs. For information about running the test suite, see the build instructions specific to your platform later in this document.
<ICU>/source/test/iotest/ A test suite written in C and C++ to test the icuio library. For information about running the test suite, see the build instructions specific to your platform later in this document.
<ICU>/source/test/testdata/ Source text files for data, which are read by the tests. It contains the subdirectories out/build/ which is used for intermediate files, and out/ which contains testdata.dat.
<ICU>/source/tools/ Tools for generating the data files. Data files are generated by invoking <ICU>/source/data/build/makedata.bat on Win32 or <ICU>/source/make on UNIX.
<ICU>/source/samples/ Various sample programs that use ICU
<ICU>/source/extra/ Non-supported API additions. Currently, it contains the 'uconv' tool to perform codepage conversion on files.
<ICU>/packaging/ This directory contain scripts and tools for packaging the final ICU build for various release platforms.
<ICU>/source/config/ Contains helper makefiles for platform specific build commands. Used by 'configure'.
<ICU>/source/allinone/ Contains top-level ICU workspace and project files, for instance to build all of ICU under one MSVC project.
<ICU>/include/ Contains the headers needed for developing software that uses ICU on Windows.
<ICU>/lib/ Contains the import libraries for linking ICU into your Windows application.
<ICU>/bin/ Contains the libraries and executables for using ICU on Windows.

How To Build And Install ICU

Recommended Build Options

Depending on the platform and the type of installation, we recommend a small number of modifications and build options.