ICU-3650 Move some stuff from the readme.html to the User's Guide.

X-SVN-Rev: 15036
This commit is contained in:
George Rhoten 2004-04-24 01:38:52 +00:00
parent c76c19a587
commit eea9124ccf

View File

@ -80,11 +80,6 @@
<li><a href="#ImportantNotesCPlusPlus">Using ICU in a Multithreaded
Environment</a></li>
<li><a href="#CharStrings">char * strings in ICU</a></li>
<li><a href="#ImportantNotesDefaultCP">Using the Default
Codepage</a></li>
<li><a href="#ImportantNotesWindows">Windows Platform</a></li>
<li><a href="#ImportantNotesUNIX">UNIX Type Platforms</a></li>
@ -1628,305 +1623,6 @@ del common/libicuuc.so
<a href=
"http://docs.sun.com/db/doc/806-6867/6jfpgdcob?a=view">http://docs.sun.com/db/doc/806-6867/6jfpgdcob?a=view</a></p>
<h3><a name="CharStrings" href="#CharStrings" id="CharStrings">char *
strings in ICU</a></h3>
<p>The C/C++ languages do not provide a portable way to specify Unicode
code point or string literals other than with arrays of numeric constants.
For convenience, ICU4C tends to use char * strings in places where only
"invariant characters" (a portable subset of the 7-bit ASCII repertoire)
are used. This allows locale IDs, charset names, resource bundle item keys
and similar items to be easily specified as string literals in the source
code. The same types of strings are also stored as "invariant character"
char * strings in the ICU data files.</p>
<p>ICU has hard coded mapping tables in <code>source/common/putil.c</code>
to convert invariant characters to and from Unicode without using a full
ICU converter. These tables must match the encoding of string literals in
the ICU code as well as in the ICU data files.</p>
<p><strong>Important:</strong> ICU assumes that at least the invariant
characters always have the same codes as is common on platforms with the
same charset family (ASCII vs. EBCDIC). <em>ICU has not been tested on
platforms where this is not the case.</em></p>
<p>Some usage of char * strings in ICU assumes the system charset instead
of invariant characters. Such strings are only handled with the default
converter (See the following section). The system charset is usually a
superset of the invariant characters.</p>
<p>The following are the ASCII and EBCDIC byte values for all of the
invariant characters (see also unicode/utypes.h):</p>
<table border="1" summary=
"There are a few invariant characters that can be used for char * strings">
<tr>
<th>Character(s)</th>
<th>ASCII</th>
<th>EBCDIC</th>
</tr>
<tr>
<td>a..i</td>
<td>61..69</td>
<td>81..89</td>
</tr>
<tr>
<td>j..r</td>
<td>6A..72</td>
<td>91..99</td>
</tr>
<tr>
<td>s..z</td>
<td>73..7A</td>
<td>A2..A9</td>
</tr>
<tr>
<td>A..I</td>
<td>41..49</td>
<td>C1..C9</td>
</tr>
<tr>
<td>J..R</td>
<td>4A..52</td>
<td>D1..D9</td>
</tr>
<tr>
<td>S..Z</td>
<td>53..5A</td>
<td>E2..E9</td>
</tr>
<tr>
<td>0..9</td>
<td>30..39</td>
<td>F0..F9</td>
</tr>
<tr>
<td>(space)</td>
<td>20</td>
<td>40</td>
</tr>
<tr>
<td>"</td>
<td>22</td>
<td>7F</td>
</tr>
<tr>
<td>%</td>
<td>25</td>
<td>6C</td>
</tr>
<tr>
<td>&amp;</td>
<td>26</td>
<td>50</td>
</tr>
<tr>
<td>'</td>
<td>27</td>
<td>7D</td>
</tr>
<tr>
<td>(</td>
<td>28</td>
<td>4D</td>
</tr>
<tr>
<td>)</td>
<td>29</td>
<td>5D</td>
</tr>
<tr>
<td>*</td>
<td>2A</td>
<td>5C</td>
</tr>
<tr>
<td>+</td>
<td>2B</td>
<td>4E</td>
</tr>
<tr>
<td>,</td>
<td>2C</td>
<td>6B</td>
</tr>
<tr>
<td>-</td>
<td>2D</td>
<td>60</td>
</tr>
<tr>
<td>.</td>
<td>2E</td>
<td>4B</td>
</tr>
<tr>
<td>/</td>
<td>2F</td>
<td>61</td>
</tr>
<tr>
<td>:</td>
<td>3A</td>
<td>7A</td>
</tr>
<tr>
<td>;</td>
<td>3B</td>
<td>5E</td>
</tr>
<tr>
<td>&lt;</td>
<td>3C</td>
<td>4C</td>
</tr>
<tr>
<td>=</td>
<td>3D</td>
<td>7E</td>
</tr>
<tr>
<td>&gt;</td>
<td>3E</td>
<td>6E</td>
</tr>
<tr>
<td>?</td>
<td>3F</td>
<td>6F</td>
</tr>
<tr>
<td>_</td>
<td>5F</td>
<td>6D</td>
</tr>
</table>
<h3><a name="ImportantNotesDefaultCP" href="#ImportantNotesDefaultCP" id=
"ImportantNotesDefaultCP">Using the default codepage</a></h3>
<p>ICU has code to determine the default codepage of the system or process.
This default codepage can be used to convert <code>char *</code> strings to
and from Unicode.</p>
<p>Depending on system design, setup and APIs, it may not always be
possible to find a default codepage that fully works as expected. For
example,</p>
<ul>
<li>On Windows there are three encodings in use at the same time. Unicode
(UTF-16) is always used inside of Windows, while for <code>char *</code>
encodings there are two classes, called "ANSI" and "OEM" codepages. ICU
will use the ANSI codepage. Note that the OEM codepage is used by default
for console window output.</li>
<li>On some UNIX-type systems, non-standard names are used for encodings,
or non-standard encodings are used altogether. Although ICU supports over
200 encodings in its standard build and many more aliases for them, it
will not be able to recognize such non-standard names.</li>
<li>Some systems do not have a notion of a system or process codepage,
and may not have APIs for that.</li>
</ul>
<p>If you have means of detecting a default codepage name that are more
appropriate for your application, then you should set that name with
<code>ucnv_setDefaultName()</code> as the first ICU function call. This
makes sure that the internally cached default converter will be
instantiated from your preferred name.</p>
<p>Starting in ICU 2.0, when a converter for the default codepage cannot be
opened, a fallback default codepage name and converter will be used. On
most platforms, this will be US-ASCII. For z/OS (OS/390), ibm-1047-s390 is
the default fallback codepage. For AS/400 (iSeries), ibm-37 is the default
fallback codepage. This default fallback codepage is used when the
operating system is using a non-standard name for a default codepage, or
the converter was not packaged with ICU. The feature allows ICU to run in
unusual computing environments without completely failing.</p>
<h3><a name="ImportantNotesWindows" href="#ImportantNotesWindows" id=
"ImportantNotesWindows">Windows Platform</a></h3>