scuffed-code/tools/unicodetools/com/ibm/text/UCA/help.html
Mark Davis 31bee02d7f misc. updates
X-SVN-Rev: 8714
2002-05-29 02:01:00 +00:00

126 lines
4.3 KiB
HTML

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-us">
<link rel="stylesheet" href="charts.css" type="text/css">
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<title>UCA Chart Help</title>
<base target="main">
</head>
<body>
<h2 align="center">UCA Chart Help</h2>
<p>This set of charts shows the Unicode Collation Algorithm values for Unicode
characters. The characters are arranged in the following groups:</p>
<table cellspacing="0" cellpadding="4">
<tr>
<th align="left"><i>Null</i></th>
<th class="x">Completely ignoreable (primary, secondary and tertiary levels)<br>
These include control codes and various formatting codes.</th>
</tr>
<tr>
<th align="left"><i>Ignorable</i></th>
<th class="x">Ignorable at a primary level, but not at a secondary or
tertiary level.<br>
These include most accents and diacritics.</th>
</tr>
<tr>
<th align="left"><i>Variable</i></th>
<th class="x">Characters that may be set to ignorable by a programmatic
switch.<br>
These include spaces, punctuation marks, and most symbols.</th>
</tr>
<tr>
<th align="left"><i>Common</i></th>
<th class="x">Characters that are none of the above, but not considered
letters.<br>
These include numbers, currency symbols, etc.</th>
<tr>
<th align="left"><i>Letters</i></th>
<th class="x">According to script</th>
</tr>
<tr>
<th align="left"><i>Unsupported</i></th>
<th class="x">Not explicitly supported in this version of UCA; uses
code-point order</th>
</tr>
</table>
<p>The characters* within each group are arranged in cells. The color of the
cell indicates the strength of the difference between that character and the <i>previous</i>
character in the chart, as follows.</p>
<table cellspacing="0" cellpadding="4">
<tr>
<th colspan="2"><font size="3"><u>No Expansion</u></font>
<th rowspan="5">&nbsp;
<th colspan="2"><font size="3"><u>Expansion</u></font>
</tr>
<tr>
<td class="p">a<br>
<tt>0061</tt></td>
<th class="x">Primary difference
<td class="ep">dz<br>
<tt>01F3</tt></td>
<th class="x">Primary difference</th>
</tr>
<tr>
<td class="s">á<br>
<tt>00E1</tt></td>
<th class="x">Secondary Difference</th>
<td class="es">DZ<br>
<tt>01F1</tt></td>
<th class="x">Secondary Difference</th>
</tr>
<tr>
<td class="t">A<br>
<tt>0041</tt></td>
<th class="x">Tertiary difference</th>
<td class="et">Dz<br>
<tt>01F2</tt></td>
<th class="x">Tertiary difference</th>
<tr>
<td class="q"><br>
<tt>212B</tt></td>
<th class="x">Quarternary difference<br>
or no difference</th>
<td class="eq">&nbsp;</td>
<th class="x">Quarternary difference<br>
or no difference</th>
</tr>
</table>
<blockquote>
<p align="left"><b>Note: </b>If tool-tips are enabled in your browser, then if
you pause the mouse over any cell, you will see the name of the character and
a representation of the sort key. In this representation, the separators
between the weight levels are represented with &quot;|&quot;.</p>
</blockquote>
<table>
<tr>
<th>*</th>
<th class="x">In some cases, the UCA data table also includes contractions.<br>
They can be recognized by the multiple code point numbers, as in the
following:</th>
<td class="p">ஔ<br>
<tt>0B92 0BD7</tt></td>
</tr>
</table>
<h3><b>Notes</b></h3>
<ul>
<li>The UCA results are versioned <i>both</i> by the version of the UCA <i>and</i>
by the version of The Unicode Standard used to process the data.</li>
<li>These charts only provide one of the alternatives for handling variable
characters (punctuation), whereby these characters are <b>non-ignorable.</b></li>
<li>Characters from large blocks, such as CJK-Ideographs, Hangul Syllables,
Private Use Area, etc. are represented by a sampling.</li>
<li>Some unassigned code points, noncharacters and other edge cases are also
added to the list for comparison.</li>
<li>For more information, see <a href="http://www.unicode.org/unicode/reports/tr10/" target="_top">UTS
#10: Unicode Collation Algorithm</a>.</li>
</ul>
</body>
</html>