31bee02d7f
X-SVN-Rev: 8714
126 lines
4.3 KiB
HTML
126 lines
4.3 KiB
HTML
<html>
|
|
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-us">
|
|
<link rel="stylesheet" href="charts.css" type="text/css">
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
|
|
<meta name="ProgId" content="FrontPage.Editor.Document">
|
|
<title>UCA Chart Help</title>
|
|
<base target="main">
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<h2 align="center">UCA Chart Help</h2>
|
|
<p>This set of charts shows the Unicode Collation Algorithm values for Unicode
|
|
characters. The characters are arranged in the following groups:</p>
|
|
<table cellspacing="0" cellpadding="4">
|
|
<tr>
|
|
<th align="left"><i>Null</i></th>
|
|
<th class="x">Completely ignoreable (primary, secondary and tertiary levels)<br>
|
|
These include control codes and various formatting codes.</th>
|
|
</tr>
|
|
<tr>
|
|
<th align="left"><i>Ignorable</i></th>
|
|
<th class="x">Ignorable at a primary level, but not at a secondary or
|
|
tertiary level.<br>
|
|
These include most accents and diacritics.</th>
|
|
</tr>
|
|
<tr>
|
|
<th align="left"><i>Variable</i></th>
|
|
<th class="x">Characters that may be set to ignorable by a programmatic
|
|
switch.<br>
|
|
These include spaces, punctuation marks, and most symbols.</th>
|
|
</tr>
|
|
<tr>
|
|
<th align="left"><i>Common</i></th>
|
|
<th class="x">Characters that are none of the above, but not considered
|
|
letters.<br>
|
|
These include numbers, currency symbols, etc.</th>
|
|
<tr>
|
|
<th align="left"><i>Letters</i></th>
|
|
<th class="x">According to script</th>
|
|
</tr>
|
|
<tr>
|
|
<th align="left"><i>Unsupported</i></th>
|
|
<th class="x">Not explicitly supported in this version of UCA; uses
|
|
code-point order</th>
|
|
</tr>
|
|
</table>
|
|
<p>The characters* within each group are arranged in cells. The color of the
|
|
cell indicates the strength of the difference between that character and the <i>previous</i>
|
|
character in the chart, as follows.</p>
|
|
<table cellspacing="0" cellpadding="4">
|
|
<tr>
|
|
<th colspan="2"><font size="3"><u>No Expansion</u></font>
|
|
<th rowspan="5">
|
|
<th colspan="2"><font size="3"><u>Expansion</u></font>
|
|
</tr>
|
|
<tr>
|
|
<td class="p">a<br>
|
|
<tt>0061</tt></td>
|
|
<th class="x">Primary difference
|
|
<td class="ep">dz<br>
|
|
<tt>01F3</tt></td>
|
|
<th class="x">Primary difference</th>
|
|
</tr>
|
|
<tr>
|
|
<td class="s">á<br>
|
|
<tt>00E1</tt></td>
|
|
<th class="x">Secondary Difference</th>
|
|
<td class="es">DZ<br>
|
|
<tt>01F1</tt></td>
|
|
<th class="x">Secondary Difference</th>
|
|
</tr>
|
|
<tr>
|
|
<td class="t">A<br>
|
|
<tt>0041</tt></td>
|
|
<th class="x">Tertiary difference</th>
|
|
<td class="et">Dz<br>
|
|
<tt>01F2</tt></td>
|
|
<th class="x">Tertiary difference</th>
|
|
<tr>
|
|
<td class="q">Å<br>
|
|
<tt>212B</tt></td>
|
|
<th class="x">Quarternary difference<br>
|
|
or no difference</th>
|
|
<td class="eq"> </td>
|
|
<th class="x">Quarternary difference<br>
|
|
or no difference</th>
|
|
</tr>
|
|
</table>
|
|
<blockquote>
|
|
<p align="left"><b>Note: </b>If tool-tips are enabled in your browser, then if
|
|
you pause the mouse over any cell, you will see the name of the character and
|
|
a representation of the sort key. In this representation, the separators
|
|
between the weight levels are represented with "|".</p>
|
|
</blockquote>
|
|
<table>
|
|
<tr>
|
|
<th>*</th>
|
|
<th class="x">In some cases, the UCA data table also includes contractions.<br>
|
|
They can be recognized by the multiple code point numbers, as in the
|
|
following:</th>
|
|
<td class="p">ஔ<br>
|
|
<tt>0B92 0BD7</tt></td>
|
|
</tr>
|
|
</table>
|
|
<h3><b>Notes</b></h3>
|
|
<ul>
|
|
<li>The UCA results are versioned <i>both</i> by the version of the UCA <i>and</i>
|
|
by the version of The Unicode Standard used to process the data.</li>
|
|
<li>These charts only provide one of the alternatives for handling variable
|
|
characters (punctuation), whereby these characters are <b>non-ignorable.</b></li>
|
|
<li>Characters from large blocks, such as CJK-Ideographs, Hangul Syllables,
|
|
Private Use Area, etc. are represented by a sampling.</li>
|
|
<li>Some unassigned code points, noncharacters and other edge cases are also
|
|
added to the list for comparison.</li>
|
|
<li>For more information, see <a href="http://www.unicode.org/unicode/reports/tr10/" target="_top">UTS
|
|
#10: Unicode Collation Algorithm</a>.</li>
|
|
</ul>
|
|
|
|
</body>
|
|
|
|
</html>
|