scuffed-code/icu4c/source/tools/gentz
Steven R. Loomis 3ddea88fec ICU-1992 data loading (.dat -> .icu)
X-SVN-Rev: 9232
2002-07-17 19:23:26 +00:00
..
.cvsignore ICU-393 .cvsignore cleanup 2000-07-13 22:04:24 +00:00
gentz.8.in ICU-1675 hmmm, we still want things to go into wherever/icu/version 2002-01-31 04:24:36 +00:00
gentz.cpp ICU-1992 data loading (.dat -> .icu) 2002-07-17 19:23:26 +00:00
gentz.dsp ICU-1877 Add Win64 support. 2002-04-26 23:04:32 +00:00
Makefile.in ICU-1868 Properly use CPPFLAGS and other flags 2002-05-13 23:51:50 +00:00
readme.txt ICU-776 Add ISO 3166 country code index to time zone data and new API to TimeZone 2001-02-02 19:26:56 +00:00
tz.alias ICU-903 updated copyright notices. 2001-03-21 23:22:16 +00:00
tz.bat ICU-65 Move time zone data to icudata 1999-11-30 23:05:49 +00:00
tz.default ICU-903 updated copyright notices. 2001-03-21 23:22:16 +00:00
tz.pl ICU-776 Add ISO 3166 country code index to time zone data and new API to TimeZone 2001-02-02 19:26:56 +00:00
tzparse.pm ICU-903 updated copyright notices. 2001-03-21 23:22:16 +00:00
tzutil.pm ICU-903 updated copyright notices. 2001-03-21 23:22:16 +00:00

Copyright (C) 1999-2001, International Business Machines Corporation 
and others.  All Rights Reserved.

Readme file for ICU time zone data (source/tools/gentz)

Alan Liu
Last updated 2 Feb 2001


RAW DATA
--------
The time zone data in ICU is taken from the UNIX data files at
ftp://elsie.nci.nih.gov/pub/tzdata<year>.  The other input to the
process is an alias table, described below.


BUILD PROCESS
-------------
Two tools are used to process the data into a format suitable for ICU:

   tz.pl    directory of raw data files + tz.alias -> tz.txt
   gentz    tz.txt -> tz.dat (memory mappable binary file)

After gentz is run, standard ICU data tools are used to incorporate
tz.dat into the icudata module.  The tz.pl script is run manually;
everything else is automatic.

In order to incorporate the raw data from that source into ICU, take
the following steps.

1. Download the archive of current zone data.  This should be a file
   named something like tzdata1999j.tar.gz.  Use the URL listed above.

2. Unpack the archive into a directory, retaining the name of the
   archive.  For example, unpack tzdata1999j.tar.gz into tzdata1999j/.
   Place this directory anywhere; one option is to place it within
   source/tools/gentz.

3. Run the perl script tz.pl, passing it the directory location as a
   command-line argument.  On Windows system use the batch file
   tz.bat.  Also specify one or more ourput files: .txt, .htm|.html,
   and .java.

   For ICU4C specify .txt and .htm; typically

     <icu>/data/timezone.txt <icu>/docs/tz.htm

   where icu is the ICU4C root directory.  Double check that these are
   the correct locations and file names; they change periodically.

   As the third argument, pass in "tz.java".  This will generate a
   java source file that will be used to update the ICU4J data.

4. Do a standard build.  The build scripts will automatically detect
   that a new .txt file is present and rebuild the binary data (using
   gentz) from that.

The .txt and .htm files and typically checked into CVS, whereas
the raw data files are not, since they are readily available from the
URL listed above.

Additional steps are required to update the ICU4J data.  First you
must have a current, working installation of icu4j.  These instructions
will assume it is in directory "/icu4j".

5. Copy the tz.java file generated in step 3 to /icu4j/tz.java.

6. Change to the /icu4j directory and compile the tz.java file, with
   /icu4j/classes on the classpath.

7. Run the resulting java program (again with /icu4j/classes on the
   classpath) and capture the output in a file named tz.tmp.

8. Open /icu4j/src/com/ibm/util/TimeZoneData.java.  Delete the section
   that starts with the line "BEGIN GENERATED SOURCE CODE" and ends
   with the line "END GENERATED SOURCE CODE".  Replace it with the
   contents of tz.tmp.  If there are extraneous control-M characters
   or other similar problems, fix them.

9. Rebuild icu4j and make sure there are no build errors.  Rerun all
   the tests in /icu4j/src/com/ibm/test/timezone and make sure they
   all pass.  If all is well, check the new TimeZoneData.java into
   CVS.


ALIAS TABLE
-----------
For backward compatibility, we define several three-letter IDs that
have been used since early ICU and correspond to IDs used in old JDKs.
These IDs are listed in tz.alias.  The tz.pl script processes this
alias table and issues errors if there are problems.


IDS
---
All *system* zone IDs must consist only of characters in the invariant
set.  See utypes.h for an explanation of what this means.  If an ID is
encountered that contains a non-invariant character, tz.pl complains.
Non-system zones may use non-invariant characters.


Etc/GMT...
----------
Users may be confused by the fact that various zones with names of the
form Etc/GMT+n appear to have an offset of the wrong sign.  For
example, Etc/GMT+8 is 8 hours *behind* GMT; that is, it corresponds to
what one typically sees displayed as "GMT-8:00".  The reason for this
inversion is explained in the UNIX zone data file "etcetera".
Briefly, this is done intentionally in order to comply with
POSIX-style signedness.  In ICU we reproduce the UNIX zone behavior
faithfully, including this confusing aspect.