2003-01-11 00:28:48 +00:00
|
|
|
Copyright (c) 2003, International Business Machines Corporation and others. All Rights Reserved.
|
|
|
|
uciter8: Lenient reading of 8-bit Unicode with a UCharIterator
|
|
|
|
|
|
|
|
This sample demonstrates reading
|
|
|
|
8-bit Unicode text leniently, accepting a mix of UTF-8 and CESU-8
|
|
|
|
and also accepting single surrogates.
|
|
|
|
UTF-8-style macros are defined as well as a UCharIterator.
|
|
|
|
The macros are incomplete (do not assemble code points from pairs of surrogates)
|
|
|
|
but sufficient for the iterator.
|
|
|
|
|
2003-02-10 19:56:40 +00:00
|
|
|
If you wish to use the lenient-UTF/CESU-8 UCharIterator in a context outside of
|
|
|
|
this sample, then copy the uit_len8.c file,
|
|
|
|
as well as either the uit_len8.h header or just the prototype that it contains.
|
|
|
|
|
|
|
|
*** Warning: ***
|
|
|
|
This UCharIterator reads an arbitrary mix of UTF-8 and CESU-8 text.
|
|
|
|
It does not conform to any one Unicode charset specification,
|
|
|
|
and its use may lead to security risks.
|
|
|
|
|
|
|
|
|
2003-01-11 00:28:48 +00:00
|
|
|
Files:
|
|
|
|
uciter8.c Main source file in C
|
2003-02-10 19:56:40 +00:00
|
|
|
uit_len8.c Lenient-UTF/CESU-8 UCharIterator implementation
|
|
|
|
uit_len8.h Header file with the prototoype for the lenient-UTF/CESU-8 UCharIterator
|
2003-01-11 00:28:48 +00:00
|
|
|
uciter8.dsw Windows MSVC workspace. Double-click this to get started.
|
|
|
|
uciter8.dsp Windows MSVC project file
|
|
|
|
|
|
|
|
To Build uciter8 on Windows
|
|
|
|
1. Install and build ICU
|
|
|
|
2. In MSVC, open the workspace file icu\samples\uciter8\uciter8.dsw
|
|
|
|
3. Choose a Debug or Release build.
|
|
|
|
4. Build.
|
2003-02-10 19:56:40 +00:00
|
|
|
|
2003-01-11 00:28:48 +00:00
|
|
|
To Run on Windows
|
|
|
|
1. Start a command shell window
|
|
|
|
2. Add ICU's bin directory to the path, e.g.
|
|
|
|
set PATH=c:\icu\bin;%PATH%
|
|
|
|
(Use the path to where ever ICU is on your system.)
|
|
|
|
3. cd into the uciter8 directory, e.g.
|
|
|
|
cd c:\icu\source\samples\uciter8\debug
|
|
|
|
4. Run it
|
|
|
|
uciter8
|
|
|
|
|
|
|
|
To Build on Unixes
|
|
|
|
1. Build ICU.
|
|
|
|
Specify an ICU install directory when running configure,
|
|
|
|
using the --prefix option. The steps to build ICU will look something
|
|
|
|
like this:
|
|
|
|
cd <icu directory>/source
|
|
|
|
runConfigureICU <platform-name> --prefix <icu install directory> [other options]
|
|
|
|
gmake all
|
2003-02-10 19:56:40 +00:00
|
|
|
|
2003-01-11 00:28:48 +00:00
|
|
|
2. Install ICU,
|
|
|
|
gmake install
|
2003-02-10 19:56:40 +00:00
|
|
|
|
2003-01-11 00:28:48 +00:00
|
|
|
3. Compile
|
|
|
|
cd <icu directory>/source/samples/uciter8
|
|
|
|
gmake ICU_PREFIX=<icu install directory)
|
2003-02-10 19:56:40 +00:00
|
|
|
|
2003-01-11 00:28:48 +00:00
|
|
|
To Run on Unixes
|
|
|
|
cd <icu directory>/source/samples/uciter8
|
|
|
|
|
|
|
|
gmake ICU_PREFIX=<icu install directory> check
|
|
|
|
-or-
|
|
|
|
|
|
|
|
export LD_LIBRARY_PATH=<icu install directory>/lib:.:$LD_LIBRARY_PATH
|
|
|
|
uciter8
|
2003-02-10 19:56:40 +00:00
|
|
|
|
|
|
|
|
2003-01-11 00:28:48 +00:00
|
|
|
Note: The name of the LD_LIBRARY_PATH variable is different on some systems.
|
|
|
|
If in doubt, run the sample using "gmake check", and note the name of
|
|
|
|
the variable that is used there. LD_LIBRARY_PATH is the correct name
|
|
|
|
for Linux and Solaris.
|