2001-10-25 20:35:06 +00:00
|
|
|
<html>
|
|
|
|
|
|
|
|
<head>
|
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
|
|
<meta http-equiv="Content-Language" content="en-us">
|
|
|
|
<link rel="stylesheet" href="charts.css" type="text/css">
|
|
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
|
|
|
|
<meta name="ProgId" content="FrontPage.Editor.Document">
|
|
|
|
<title>UCA Chart Help</title>
|
|
|
|
<base target="main">
|
|
|
|
</head>
|
|
|
|
|
|
|
|
<body>
|
|
|
|
|
|
|
|
<h2 align="center">UCA Chart Help</h2>
|
|
|
|
<p>This set of charts shows the Unicode Collation Algorithm values for Unicode
|
|
|
|
characters. The characters are arranged in the following groups:</p>
|
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<th align="left"><i>Null</i></th>
|
|
|
|
<th class="x">Completely ignoreable (primary, secondary and tertiary levels)<br>
|
|
|
|
These include control codes and various formatting codes.</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<th align="left"><i>Ignorable</i></th>
|
|
|
|
<th class="x">Ignorable at a primary level, but not at a secondary or
|
|
|
|
tertiary level.<br>
|
|
|
|
These include most accents and diacritics.</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<th align="left"><i>Variable</i></th>
|
|
|
|
<th class="x">Characters that may be set to ignorable by a programmatic
|
|
|
|
switch.<br>
|
|
|
|
These include spaces, punctuation marks, and most symbols.</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<th align="left"><i>Common</i></th>
|
|
|
|
<th class="x">Characters that are none of the above, but not considered
|
|
|
|
letters.<br>
|
|
|
|
These include numbers, currency symbols, etc.</th>
|
|
|
|
<tr>
|
|
|
|
<th align="left"><i>Letters</i></th>
|
|
|
|
<th class="x">According to script</th>
|
|
|
|
</tr>
|
2001-10-26 23:33:48 +00:00
|
|
|
<tr>
|
|
|
|
<th align="left"><i>Unsupported</i></th>
|
|
|
|
<th class="x">Not explicitly supported in this version of UCA; uses
|
|
|
|
code-point order</th>
|
|
|
|
</tr>
|
2001-10-25 20:35:06 +00:00
|
|
|
</table>
|
2001-10-26 23:33:48 +00:00
|
|
|
<p>Characters from large blocks, such as CJK-Ideographs, Hangul Syllables,
|
|
|
|
Private Use Area, etc. are represented by a sampling. Some unassigned code
|
|
|
|
points, non-characters and other edge cases are also added to the list.</p>
|
2001-10-25 20:35:06 +00:00
|
|
|
<p>The characters* within each group are arranged in cells. The color of the
|
|
|
|
cell indicates the strength of the difference between that character and the <i>previous</i>
|
|
|
|
character in the chart, as follows.</p>
|
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<th colspan="2"><font size="3">No Expansion</font>
|
|
|
|
<th rowspan="5" width="20">
|
|
|
|
<th colspan="2"><font size="3">Expansion</font>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td class="p">a<br>
|
|
|
|
<tt>0061</tt></td>
|
|
|
|
<th class="x">Primary difference
|
|
|
|
<td class="ep">dz<br>
|
|
|
|
<tt>01F3</tt></td>
|
|
|
|
<th class="x">Primary difference</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td class="s">á<br>
|
|
|
|
<tt>00E1</tt></td>
|
|
|
|
<th class="x">Secondary Difference</th>
|
|
|
|
<td class="es">DZ<br>
|
|
|
|
<tt>01F1</tt></td>
|
|
|
|
<th class="x">Secondary Difference</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td class="t">A<br>
|
|
|
|
<tt>0041</tt></td>
|
|
|
|
<th class="x">Tertiary difference</th>
|
|
|
|
<td class="et">Dz<br>
|
|
|
|
<tt>01F2</tt></td>
|
|
|
|
<th class="x">Tertiary difference</th>
|
|
|
|
<tr>
|
|
|
|
<td class="q">Å<br>
|
|
|
|
<tt>212B</tt></td>
|
|
|
|
<th class="x">Quarternary difference<br>
|
|
|
|
or no difference</th>
|
|
|
|
<td class="eq"> </td>
|
|
|
|
<th class="x">Quarternary difference<br>
|
|
|
|
or no difference</th>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2001-10-26 23:33:48 +00:00
|
|
|
<p align="left">If tool-tips are enabled in your browser, then if you pause the
|
|
|
|
mouse over any cell, you will see a representation of the sort key. In this
|
|
|
|
representation, the separators between the weight levels are represented with
|
|
|
|
"|".</p>
|
2001-10-25 20:35:06 +00:00
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<th>*</th>
|
|
|
|
<th class="x">In some cases, the UCA data table also includes contractions.<br>
|
|
|
|
They can be recognized by the multiple code point numbers, as in the
|
|
|
|
following:</th>
|
|
|
|
<td class="p">ஔ<br>
|
|
|
|
<tt>0B92 0BD7</tt></td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
</body>
|
|
|
|
|
|
|
|
</html>
|