ICU-3650 Move some stuff from the readme.html to the User's Guide.
X-SVN-Rev: 15036
This commit is contained in:
parent
c76c19a587
commit
eea9124ccf
@ -80,11 +80,6 @@
|
||||
<li><a href="#ImportantNotesCPlusPlus">Using ICU in a Multithreaded
|
||||
Environment</a></li>
|
||||
|
||||
<li><a href="#CharStrings">char * strings in ICU</a></li>
|
||||
|
||||
<li><a href="#ImportantNotesDefaultCP">Using the Default
|
||||
Codepage</a></li>
|
||||
|
||||
<li><a href="#ImportantNotesWindows">Windows Platform</a></li>
|
||||
|
||||
<li><a href="#ImportantNotesUNIX">UNIX Type Platforms</a></li>
|
||||
@ -1628,305 +1623,6 @@ del common/libicuuc.so
|
||||
<a href=
|
||||
"http://docs.sun.com/db/doc/806-6867/6jfpgdcob?a=view">http://docs.sun.com/db/doc/806-6867/6jfpgdcob?a=view</a></p>
|
||||
|
||||
<h3><a name="CharStrings" href="#CharStrings" id="CharStrings">char *
|
||||
strings in ICU</a></h3>
|
||||
|
||||
<p>The C/C++ languages do not provide a portable way to specify Unicode
|
||||
code point or string literals other than with arrays of numeric constants.
|
||||
For convenience, ICU4C tends to use char * strings in places where only
|
||||
"invariant characters" (a portable subset of the 7-bit ASCII repertoire)
|
||||
are used. This allows locale IDs, charset names, resource bundle item keys
|
||||
and similar items to be easily specified as string literals in the source
|
||||
code. The same types of strings are also stored as "invariant character"
|
||||
char * strings in the ICU data files.</p>
|
||||
|
||||
<p>ICU has hard coded mapping tables in <code>source/common/putil.c</code>
|
||||
to convert invariant characters to and from Unicode without using a full
|
||||
ICU converter. These tables must match the encoding of string literals in
|
||||
the ICU code as well as in the ICU data files.</p>
|
||||
|
||||
<p><strong>Important:</strong> ICU assumes that at least the invariant
|
||||
characters always have the same codes as is common on platforms with the
|
||||
same charset family (ASCII vs. EBCDIC). <em>ICU has not been tested on
|
||||
platforms where this is not the case.</em></p>
|
||||
|
||||
<p>Some usage of char * strings in ICU assumes the system charset instead
|
||||
of invariant characters. Such strings are only handled with the default
|
||||
converter (See the following section). The system charset is usually a
|
||||
superset of the invariant characters.</p>
|
||||
|
||||
<p>The following are the ASCII and EBCDIC byte values for all of the
|
||||
invariant characters (see also unicode/utypes.h):</p>
|
||||
|
||||
<table border="1" summary=
|
||||
"There are a few invariant characters that can be used for char * strings">
|
||||
<tr>
|
||||
<th>Character(s)</th>
|
||||
|
||||
<th>ASCII</th>
|
||||
|
||||
<th>EBCDIC</th>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>a..i</td>
|
||||
|
||||
<td>61..69</td>
|
||||
|
||||
<td>81..89</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>j..r</td>
|
||||
|
||||
<td>6A..72</td>
|
||||
|
||||
<td>91..99</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>s..z</td>
|
||||
|
||||
<td>73..7A</td>
|
||||
|
||||
<td>A2..A9</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>A..I</td>
|
||||
|
||||
<td>41..49</td>
|
||||
|
||||
<td>C1..C9</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>J..R</td>
|
||||
|
||||
<td>4A..52</td>
|
||||
|
||||
<td>D1..D9</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>S..Z</td>
|
||||
|
||||
<td>53..5A</td>
|
||||
|
||||
<td>E2..E9</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>0..9</td>
|
||||
|
||||
<td>30..39</td>
|
||||
|
||||
<td>F0..F9</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>(space)</td>
|
||||
|
||||
<td>20</td>
|
||||
|
||||
<td>40</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>"</td>
|
||||
|
||||
<td>22</td>
|
||||
|
||||
<td>7F</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>%</td>
|
||||
|
||||
<td>25</td>
|
||||
|
||||
<td>6C</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>&</td>
|
||||
|
||||
<td>26</td>
|
||||
|
||||
<td>50</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>'</td>
|
||||
|
||||
<td>27</td>
|
||||
|
||||
<td>7D</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>(</td>
|
||||
|
||||
<td>28</td>
|
||||
|
||||
<td>4D</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>)</td>
|
||||
|
||||
<td>29</td>
|
||||
|
||||
<td>5D</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>*</td>
|
||||
|
||||
<td>2A</td>
|
||||
|
||||
<td>5C</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>+</td>
|
||||
|
||||
<td>2B</td>
|
||||
|
||||
<td>4E</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>,</td>
|
||||
|
||||
<td>2C</td>
|
||||
|
||||
<td>6B</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>-</td>
|
||||
|
||||
<td>2D</td>
|
||||
|
||||
<td>60</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>.</td>
|
||||
|
||||
<td>2E</td>
|
||||
|
||||
<td>4B</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>/</td>
|
||||
|
||||
<td>2F</td>
|
||||
|
||||
<td>61</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>:</td>
|
||||
|
||||
<td>3A</td>
|
||||
|
||||
<td>7A</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>;</td>
|
||||
|
||||
<td>3B</td>
|
||||
|
||||
<td>5E</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td><</td>
|
||||
|
||||
<td>3C</td>
|
||||
|
||||
<td>4C</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>=</td>
|
||||
|
||||
<td>3D</td>
|
||||
|
||||
<td>7E</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>></td>
|
||||
|
||||
<td>3E</td>
|
||||
|
||||
<td>6E</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>?</td>
|
||||
|
||||
<td>3F</td>
|
||||
|
||||
<td>6F</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td>_</td>
|
||||
|
||||
<td>5F</td>
|
||||
|
||||
<td>6D</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<h3><a name="ImportantNotesDefaultCP" href="#ImportantNotesDefaultCP" id=
|
||||
"ImportantNotesDefaultCP">Using the default codepage</a></h3>
|
||||
|
||||
<p>ICU has code to determine the default codepage of the system or process.
|
||||
This default codepage can be used to convert <code>char *</code> strings to
|
||||
and from Unicode.</p>
|
||||
|
||||
<p>Depending on system design, setup and APIs, it may not always be
|
||||
possible to find a default codepage that fully works as expected. For
|
||||
example,</p>
|
||||
|
||||
<ul>
|
||||
<li>On Windows there are three encodings in use at the same time. Unicode
|
||||
(UTF-16) is always used inside of Windows, while for <code>char *</code>
|
||||
encodings there are two classes, called "ANSI" and "OEM" codepages. ICU
|
||||
will use the ANSI codepage. Note that the OEM codepage is used by default
|
||||
for console window output.</li>
|
||||
|
||||
<li>On some UNIX-type systems, non-standard names are used for encodings,
|
||||
or non-standard encodings are used altogether. Although ICU supports over
|
||||
200 encodings in its standard build and many more aliases for them, it
|
||||
will not be able to recognize such non-standard names.</li>
|
||||
|
||||
<li>Some systems do not have a notion of a system or process codepage,
|
||||
and may not have APIs for that.</li>
|
||||
</ul>
|
||||
|
||||
<p>If you have means of detecting a default codepage name that are more
|
||||
appropriate for your application, then you should set that name with
|
||||
<code>ucnv_setDefaultName()</code> as the first ICU function call. This
|
||||
makes sure that the internally cached default converter will be
|
||||
instantiated from your preferred name.</p>
|
||||
|
||||
<p>Starting in ICU 2.0, when a converter for the default codepage cannot be
|
||||
opened, a fallback default codepage name and converter will be used. On
|
||||
most platforms, this will be US-ASCII. For z/OS (OS/390), ibm-1047-s390 is
|
||||
the default fallback codepage. For AS/400 (iSeries), ibm-37 is the default
|
||||
fallback codepage. This default fallback codepage is used when the
|
||||
operating system is using a non-standard name for a default codepage, or
|
||||
the converter was not packaged with ICU. The feature allows ICU to run in
|
||||
unusual computing environments without completely failing.</p>
|
||||
|
||||
<h3><a name="ImportantNotesWindows" href="#ImportantNotesWindows" id=
|
||||
"ImportantNotesWindows">Windows Platform</a></h3>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user