49305fcad3
X-SVN-Rev: 17469 |
||
---|---|---|
.. | ||
com/ibm | ||
license.html | ||
readme.html |
<html> <head> <meta http-equiv="Content-Language" content="en-us"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title>New Page 18</title> </head> <body> <h1>UnicodeTools</h1> <p>This file provides instructions for building and running the UnicodeTools, which<br> can be used to:</p> <ul> <li>build the Derived Unicode files in the UCD (Unicode Character Database),</li> <li>build the transformed UCA (Unicode Collation Algorithm) files needed by ICU.</li> <li>run consistency checks on beta releases of the UCD and the UCA.</li> <li>build 4 chart folders on the unicode site</li> </ul> <p><font color="#FF0000"><b>WARNING!!</b></font></p> <ul> <li>This is NOT production level code, and should never be used in programs.</li> <li>The API is subject to change without notice, and will not be maintained.</li> <li>The source is uncommented, and has many warts; since it is not production code, it has not been worth the time to clean it up.</li> <li>It will probably not work on Unix or Mac without changing the file separator.</li> <li>Currently it uses hard-coded directory names.</li> <li>The contents of multiple versions of the UCD must be copied to a local directory, as described below.</li> </ul> <h2>Instructions:</h2> <h3>0. You will need to get ICU4J on your system, using CVS.</h3> <p>The rest of this will assume that you have set up CVS so that you load the ICU4J project into C:\ICU4J<br> <br> You need both the main icu4j and a subproject called unicodetools. See: <a href="http://ibm.com/software/globalization/icu/repository.jsp"> http://ibm.com/software/globalization/icu/repository.jsp</a>. Inside unicodetools, look at com/ibm/text. The main directories of interest are UCD, UCA and utility.</p> <h4>0a. If you are using Eclipse for your IDE, look at the instructions on <a href="http://icu.sourceforge.net/docs/eclipse_howto/eclipse_howto.html"> http://oss.software.ibm.com/icu/docs/eclipse_howto/eclipse_howto.html</a> </h4> <p>Set up Eclipse to build two projects: ICU4J and UnicodeTools:<br> <br> Project Name: ICU4J<br> Directory: C:\ICU4J\icu4j<br> Default output folder = ICU4J/classes<br> <br> Project Name: UnicodeTools<br> Directory: C:\ICU4J\unicodetools<br> Default Output Folder: UnicodeTools/classes<br> <br> After Eclipse is set up with these, exclude certain files from UnicodeTools:<br> <br> Right-Click UnicodeTools > Properties > Java Build Path > Exclusions<br> com/ibm/rbm/<br> com/ibm/text/utility/UnicodeMapInt.java<br> com/ibm/text/utility/TestUtility.java<br> com/ibm/text/UCD/GenerateThaiBreaks-old.java/<br> com/ibm/text/UCD/ProcessUnihan.java/<br> com/ibm/text/UCA/WriteHTMLCollation.java/<br> <br> UnicodeTools must also include the ICU4J project, with<br> <br> Right-Click UnicodeTools > Properties > Java Build Path > Projects</p> <h3>1. In UCD, you must edit UCD_Types.java at the top, to set the directories for the build:</h3> <p>public static final String DATA_DIR = "C:\\DATA\\";<br> public static final String UCD_DIR = BASE_DIR + "UCD\\";<br> public static final String BIN_DIR = DATA_DIR + "BIN\\";<br> public static final String GEN_DIR = DATA_DIR + "GEN\\";<br> <br> Make sure that each of these directories exist. Also make sure that the following<br> exist:<br> <br> <GEN_DIR>/DerivedData<br> <GEN_DIR>/DerivedData/ExtractedProperties<br> <UCD_DIR>/EXTRAS-Update</p> <h3>2. Download all of the UnicodeData files for each version into UCD_DIR.</h3> <p>The folder names must be of the form: "3.2.0-Update", so rename the folders on the<br> Unicode site to this format.</p> <h4>2a Ensure Complete Release</h4> <p>If you are downloading any "incomplete" release (one that does not contain a complete set of data files for that release, you need to also download the previous complete release). Most of the N.M-Update directoriess are complete, *except*:</p> <p>4.0-Update, which does not contain a copy of Unihan.txt and some other files<br> 3.1-Update, which does not contain a copy of BidiMirroring.txt</p> <p>Also, make the following changes to UnicodeData for 1.1.5:</p> <p><b>Delete</b></p> <pre>3400;HANGUL SYLLABLE KIYEOK A;Lo;0;L;1100 1161;;;;N;;;;; 4DFF;HANGUL SYLLABLE MIEUM WEO RIEUL-THIEUTH;Lo;0;L;1106 116F 11B4;;;;N;;;;; 4E00;<cjk IDEOGRAPH REPRESENTATIVE>;Lo;0;L;;;;;N;;;;;</pre> <p><b>Add:</b></p> <pre>4E00;<cjk Ideograph, First>;Lo;0;L;;;;;N;;;;; 9FA5;<cjk Ideograph, Last>;Lo;0;L;;;;;N;;;;; E000;<private Use, First>;Co;0;L;;;;;N;;;;; F8FF;<private Use, Last>;Co;0;L;;;;;N;;;;;</pre> <p><b>And from a late version of Unicode, add:</b></p> <pre>F900;CJK COMPATIBILITY IDEOGRAPH-F900;Lo;0;L;8C48;;;;N;;;;; ... FA2D;CJK COMPATIBILITY IDEOGRAPH-FA2D;Lo;0;L;9DB4;;;;N;;;;;</pre> <h4>2b. UCA data</h4> <p>If you are building any of the UCA tools, you need to get a copy of the UCA data file<br> from http://www.unicode.org/reports/tr10/#AllKeys. The default location for this is:<br> <br> BASE_DIR + "Collation\allkeys" + VERSION + ".txt".<br> <br> If you have it in a different location, change that value for KEYS in UCA.java, and <br> the value for BASE_DIR</p> <h4>2c. Here is an example of the default directory structure with files:</h4> <pre>C://DATA/ BIN/ Collation/ allkeys-3.1.1.txt GEN/ DerivedData/ ExtractedProperties UCD/ 3.0.0-Update/ Unihan-3.2.0.txt ... 3.0.1-Update/ ... 3.1.0-Update/ ... 3.1.1-Update/ ... 3.2.0-Update/ ... 4.0.0-Update/ ArabicShaping-4.0.0d14b.txt BidiMirroring-4.0.0d1b.txt ... EXTRAS-Update/</pre> <h3>3. Versions</h3> <p>All of the following have "version X" in the options you give to Java (either on the command line, or in the Eclipse 'run' options. If you want a specific version like 3.1.0, then you would write "version 3.1.1". If you want the latest version (4.1.0), you can omit the "version X".</p> <h3>4. Running UCD, you will use com.ibm.text.UCD.Main as your main class.</h3> <p>The Working directory has to be C:\ICU4J\unicodetools\com\ibm\text\UCD<br> (In Eclipse you can also use ${workspace_loc:UnicodeTools/com/ibm/text/UCD}, which abstracts away the location.)<br> <br> The same for UCA:</p> <p>main: com.ibm.text.UCD.Main<br> directory: <a href="file:///C:/ICU4J/unicodetools/com/ibm/text/UCA"> C:\ICU4J\unicodetools\com\ibm\text\UCA</a></p> <h4>4a. BIN</h4> <p>For each version, the tools build a set of binary data in BIN that contain the information for that release. This is done automatically, or you can manually do it with the options<br> <br> version X build<br> <br> This builds an compressed format of all the UCD data (except blocks and Unihan) into the BIN directory. Don't worry about the voluminous console messages, unless one says "FAIL".<br> <br> <font color="#FF0000"><i>You have to manually do this if you change any of the data files in that version!!</i></font></p> <p>Note: if for any reason you modify the binary format of the BIN files, you also have to bump the value in that file:<br> <br> static final byte BINARY_FORMAT = 8; // bumped if binary format of UCD changes</p> <h4>4b. To build the Unicode files for a particular version X, run the Main with the following argument:</h4> <p>MakeUnicodeFiles.generateFile</p> <p>This will execute the commands in the file MakeUnicodeFiles.txt.</p> <p>You will edit that file if you want a different 'd' version for the files, OR if you want to change which files are built. At the top of the file you will see the following text:</p> <pre>Generate: </pre> <pre>DeltaVersion: 7</pre> <h4>4c. To change which files are built, put any number of regular expressions separated by spaces after Generate. Eg,</h4> <pre>Generate: .*line.* prop.*</pre> <p>The matching is case-insensitive.</p> <h4>4d. To change the 'd' number that is appended to the generated files names, change the DeltaVersion.</h4> <h4>4e. To run basic consistency checking, run:</h4> <p>version X verify<br> <br> Don't worry about any console messages except those that say FAIL.</p> <h4>4f. Output</h4> <p>The files will be generated in the GEN directories.</p> <ul> <li>If they are the same as previous files (except for the first line and Date), they will be renamed to UNCHANGED... </li> <li>If they are not, then a bat file will be generated in the DIFF directory. Double-clicking on this file will launch CompareIt, which is a nice diff program. Get compareIt from <a class="xurl" href="http://www.grigsoft.com/files.htm">http://www.grigsoft.com/files.htm</a> (be sure to get the Unicode version),then you can also set it as the diff program in CVS with Admin/Preferences/WinCVS, External Diff = C:\Program Files\Compare It!\wincmp3.exe (or equiv).</li> </ul> <h3>5. Running UCA, you will use com.ibm.text.UCA.Main as your main class.</h3> <h4>5a. To build all the UCA files used by ICU, use the option:</h4> <p>java <UCA>Main ICU</p> <h4>6. To build all the charts, use the UCA project, with options: normalizationChart caseChart scriptChart indexChart</h4> </body> </html>