f49a604ffe
are not explictely extern, and a C++ one for declarations in the first member of for() loops. X-SVN-Rev: 2112
303 lines
16 KiB
HTML
303 lines
16 KiB
HTML
<html>
|
|
<!DOCTYPE html public "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<title>ICU Coding Guidelines</title>
|
|
<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1">
|
|
</head>
|
|
<body bgcolor="#FFFFFF">
|
|
<h1>International Components for Unicode</h1>
|
|
<h2>ICU Coding Guidelines</h2>
|
|
<hr>
|
|
<p>This page describes guidelines for writing code for the International
|
|
Components for Unicode and how to add it to the project.</p>
|
|
<ul>
|
|
<li><a href="#coding">General Guidelines</a></li>
|
|
<li><a href="#addfiles">Adding files to ICU</a></li>
|
|
<li><a href="#tools">Build Tools</a></li>
|
|
</ul>
|
|
<a name="coding"></a>
|
|
<h2 align="center">General Guidelines</h2>
|
|
<ul>
|
|
<li>Constants (#define, enum items, const): uppercase. <code>UBREAKITERATOR_DONE</code>,
|
|
<code>UBIDI_DEFAULT_LTR</code>, <code>ULESS</code>.</li>
|
|
<li>Variables and functions: mixed-case, starting with lowercase. <code>getLength()</code>.</li>
|
|
<li>Types (class, struct, enum, union): mixed-case, starting with uppercase.
|
|
<code>class DateFormatSymbols</code>.</li>
|
|
<li>We use getProperty() and setProperty() functions.</li>
|
|
<li>getLength(), getSomethingAt(index/offset).</li>
|
|
<li>Where we return a number of items, it is <code>countItems()</code>
|
|
- <i>not</i> getItemCount() (even if we do not need to actually <i>count</i>
|
|
in the implementation of that member function).</li>
|
|
<li>Ranges of indexes: we specify a range of indexes by having <i>start</i>
|
|
and <i>limit</i> parameters, with names or suffixes like that. Such
|
|
a range contains indexes from start to limit-1, i.e., it is an interval
|
|
that is left-closed and right-open. Mathematically, [start..limit[.</li>
|
|
<li>Functions that take a buffer (pointer) and a length argument with
|
|
a default value so that the function determines the length of the input
|
|
itself (for text, calling u_strlen()), that default value should be
|
|
-1. Any other negative or undefined value constitutes an error.</li>
|
|
<li>Primitive types: they are defined by utypes.h or a header file that
|
|
it includes. The most common ones are uint8_t, uint16_t, uint32_t, int8_t,
|
|
int16_t, int32_t, UTextOffset (signed), and UErrorCode.</li>
|
|
<li>File names (.h, .c, .cpp, data files, etc.): 8.3, all lowercase.</li>
|
|
<li>Language extensions and standards: do not use any features (language
|
|
extensions or library functions) that are proprietary and do not work
|
|
on all C or C++ compilers.<br>
|
|
For example, in Microsoft Visual C++, you should go to Project Settings(alt-f7)->All
|
|
Configurations->C/C++->Customize, and set Disable Language Extensions.</li>
|
|
<li>Tabs and indentation: no tab characters (\x09), save files with spaces
|
|
instead.<br>
|
|
Indentation is of size 4.</li>
|
|
<li>Documentation: We use Javadoc-style in-file documentation with <a href="http://www.doxygen.org">doxygen</a>.
|
|
We have a printed version of the manual in our "library", and our files
|
|
use this.</li>
|
|
<li>You should not place multiple statements into one line. You should
|
|
not have if() or loop heads followed by their bodies on the same line.</li>
|
|
<li>Placements of curly braces {}: each of us subscribes to different
|
|
philosophies. Please try to do it reasonably and consistently. It is
|
|
a good idea to use the style of a file when you modify it, instead of
|
|
mixing in your favorite style.<br>
|
|
The one thing that we ask for is to not have if() and loop bodies without
|
|
curly braces.</li>
|
|
<li>Function declarations should have one line with the return type and
|
|
all the import, extern, and export declarations, and the function name
|
|
and signature at the beginning of the next line.
|
|
<pre>
|
|
U_CAPI int32_t U_EXPORT2
|
|
u_formatMessage(...);
|
|
</pre>
|
|
</li>
|
|
<li>Always use <code>static</code> for variables, functions, and constants
|
|
that are not exported explicitely by a header file. Some platforms are
|
|
confused if non-static symbols are not explictely declared extern, and
|
|
will not be able to build ICU and link against it. </li>
|
|
<li>Error codes: The ICU API functions for both C and C++ are using a
|
|
pointer or a reference to UErrorCode whereever we expect things to possibly
|
|
go wrong.
|
|
<p>In C, this is a pointer, and it must be checked for <code>NULL</code>.
|
|
In C++, this is a reference. In both cases, it must be checked for
|
|
an error code already being in there:</p>
|
|
<pre>
|
|
U_CAPI const UBiDiLevel * U_EXPORT2
|
|
ubidi_getLevels(UBiDi *pBiDi, UErrorCode *pErrorCode) {
|
|
UTextOffset start, length;
|
|
|
|
if(pErrorCode==NULL || U_FAILURE(*pErrorCode)) {
|
|
return NULL;
|
|
} else if(pBiDi==NULL || (length=pBiDi->length)<=0) {
|
|
*pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
|
|
return NULL;
|
|
}
|
|
|
|
...
|
|
return result;
|
|
}
|
|
</pre>
|
|
This prevents the API function from doing anything on data that is not
|
|
valid in a chain of function calls and relieves the caller from checking
|
|
the error code after each call.</li>
|
|
<li>Decide whether your module is part of the "common" or the "i18n" API
|
|
collection. Use the appropriate macros, like <code>U_COMMON_IMPLEMENTATION</code>,
|
|
<code>U_I18N_IMPLEMENTATION</code>, <code>U_COMMON_API</code>, <code>U_I18N_API</code>.
|
|
See <code>utypes.h</code>.</li>
|
|
<li>If we have the same module in C and in C++, then there will be two
|
|
header files, one for each language, even if one uses the other. For
|
|
example, ubidi.h for C and bidi.h for C++.</li>
|
|
<li>
|
|
<p>Platform dependencies are dealt with in the header files that <code>utypes.h</code>
|
|
includes. They are <code>platform.h</code> and its more specific cousins
|
|
like <code>pwin32.h</code> for Windows, which define basic types,
|
|
and <code>putil.h</code>, which defines platform utilities.</p>
|
|
<p></p>
|
|
<strong>Important: </strong>Outside of these files, and a small number
|
|
of implementation files that depend on platform differences (like <code>umutex.c</code>),
|
|
<i>no</i> ICU source code may have <i>any</i> <code>#ifdef <i>OperatingSystemName</i></code>
|
|
instructions.
|
|
<p></p>
|
|
</li>
|
|
<li> For mutual-exclusion (mutex) blocks, there should be no function
|
|
calls within a mutex block. The idea behind this is to prevent deadlocks
|
|
from occuring later. There should be as little code inside a mutex block
|
|
as possible to minimize the performance degradation from blocked threads.
|
|
</li>
|
|
</ul>
|
|
<h3>C Guidelines</h3>
|
|
<ul>
|
|
<li>Since we don't have classes to subdivide the namespace, we use prefixes
|
|
to avoid name collisions. Some of those prefixes contain a 3- (or sometimes
|
|
4-) letter module identifier. Very general names like <code>u_charDirection()</code>
|
|
don't have a module identifier in their prefix.
|
|
<ul>
|
|
<li>For POSIX replacements, we prepend the (all lowercase) POSIX function
|
|
name with "u_": <code>u_strlen()</code>.</li>
|
|
<li>For other API functions, we prepend a 'u', the module identifier
|
|
(if appropriate), and an underscore '_', followed by the <i>mixed-case</i>
|
|
function name: <code>u_charDirection()</code>, <code>ubidi_setPara()</code>.</li>
|
|
<li>For types (struct, enum, union), we prepend a "U", often "U<module
|
|
identifier>" directly to the typename, without an underscore. <code>UComparisonResult</code>.</li>
|
|
<li>For <code>#define</code>d constants and macros , we prepend a
|
|
"U_", often "U<module identifier>_" with an underscore to the
|
|
uppercase macro name. <code>U_ZERO_ERROR</code>, <code>U_SUCCESS()</code>.</li>
|
|
</ul>
|
|
</li>
|
|
<li>Function declarations need to be of the form <code>U_CAPI return-type
|
|
U_EXPORT2</code> to satisfy all compilers' needs.</li>
|
|
<li>Functions that roughly compare to constructors and destructors are
|
|
called umod_open() and umod_close().
|
|
<pre>
|
|
U_CAPI UBiDi * U_EXPORT2
|
|
ubidi_open();
|
|
|
|
U_CAPI UBiDi * U_EXPORT2
|
|
ubidi_openSized(UTextOffset maxLength, UTextOffset maxRunCount);
|
|
|
|
U_CAPI void U_EXPORT2
|
|
ubidi_close(UBiDi *pBiDi);
|
|
</pre>
|
|
</li>
|
|
<li>In cases like BreakIterator and NumberFormat, instead of having several
|
|
different 'open' APIs for each kind of instances, use an enum selector.</li>
|
|
<li>File names begin with a "u".</li>
|
|
<li>For memory allocation in C implementation files for ICU, the functions/macros
|
|
in <code>cmemory.h</code> must be used. </li>
|
|
<li>Do not use C++ style comments in C files and in headers that will be included in C files.
|
|
Some of the supported platforms are allergic to C++ style comments in C files.</li>
|
|
</ul>
|
|
<h3>C++ Guidelines</h3>
|
|
<ul>
|
|
<li>Classes and their members do not need a "U" or any other prefix.</li>
|
|
<li>Class APIs need to be declared like <code>class U_I18N_API SimpleDateFormat</code>
|
|
or like <code>class U_COMMON_API UCharCharacterIterator</code>.</li>
|
|
<li>Class member functions should be only declared, not inline-implemented,
|
|
in the class declaration. Inline implementations may follow after the
|
|
class declaration in the same file.</li>
|
|
<li>We are using <code>XP_PLUSPLUS</code> to make sure the compiler does
|
|
C++, not <code>__cplusplus</code>.</li>
|
|
<li>We do not use exceptions, and we do not use templates (at least on
|
|
the API).</li>
|
|
<li>File names do not begin with a "u".</li>
|
|
<li>
|
|
<h4>Adoption of objects:</h4>
|
|
Some constructors and factory functions take pointers to objects that
|
|
they <i>adopt</i>. This means that the newly created object will contain
|
|
a pointer to the adoptee and takes over ownership and lifecycle control.
|
|
If an error occurs while creating the new object - and thus in the code
|
|
that adopts an object - then the semantics used within ICU must be <i>adopt-on-call</i>
|
|
(as opposed to, e.g., adopt-on-success):
|
|
<ul>
|
|
<li>General: A constructor or factory function that adopts an object
|
|
does so in all cases, even if an error occurs and a <code>UErrorCode</code>
|
|
is set. This means that either the adoptee is deleted immediately
|
|
or its pointer is stored in the new object. The former case is most
|
|
common when the constructor or factory function is called and the
|
|
<code>UErrorCode</code> already indicates a failure. In the latter
|
|
case, the new object must take care of deleting the adoptee once
|
|
it is deleted itself regardless of whether the constructor was successful.</li>
|
|
<li>Constructors: The code that creates the object with the <code>new</code>
|
|
operator must check the resulting pointer returned by <code>new</code>
|
|
and delete any adoptees if it is <code>0</code> because the constructor
|
|
was not called. (Typically, a <code>UErrorCode</code> must be set
|
|
to <code>U_MEMORY_ALLOCATION_ERROR</code>.)</li>
|
|
<li>Factory functions (<code>createInstance()</code>): The factory
|
|
function must set a <code>U_MEMORY_ALLOCATION_ERROR</code> and delete
|
|
any adoptees if it cannot allocate the new object. If the construction
|
|
of the object fails otherwise, then the factory function must delete
|
|
it - and it in turn must delete its adoptees. As a result, a factory
|
|
function always returns either a valid object and a successful <code>UErrorCode</code>,
|
|
or a <code>0</code> pointer and a failure <code>UErrorCode</code>.<br>
|
|
Example:
|
|
<pre>
|
|
Calendar*
|
|
Calendar::createInstance(TimeZone* zone, UErrorCode& errorCode) {
|
|
if(U_FAILURE(errorCode)) {
|
|
delete zone;
|
|
return 0;
|
|
}
|
|
// since the Locale isn't specified, use the default locale
|
|
Calendar* c = new GregorianCalendar(zone, Locale::getDefault(), errorCode);
|
|
if(c == 0) {
|
|
errorCode = U_MEMORY_ALLOCATION_ERROR;
|
|
delete zone;
|
|
} else if(U_FAILURE(errorCode)) {
|
|
delete c;
|
|
c = 0;
|
|
}
|
|
return c;
|
|
}
|
|
</pre>
|
|
</ul>
|
|
</li>
|
|
<li>For memory allocation in C++ implementation files for ICU, the standard
|
|
<code>new</code> and <code>delete</code> operators (or the C functions/macros
|
|
in <code>cmemory.h</code>) must be used. </li>
|
|
<li>Iterations through <code>for()</code> loops should not use declarations
|
|
in the first part of the loop. The scoping of such declarations
|
|
has gone through two revisions about this, and some compilers do not
|
|
comply to the latest scoping. Declarations of loop variables should be
|
|
outside these loops. </li>
|
|
</ul>
|
|
<a name="addfiles"></a>
|
|
<h2 align="center">Adding files to ICU</h2>
|
|
<h3>Adding .c, .cpp, and .h files</h3>
|
|
<p>In order to add compilable files to ICU, you not only need to add them
|
|
to the source code control system in the appropriate folder, but also
|
|
add them to the build environment.</p>
|
|
<p>The first step is to choose one of the ICU libraries:</p>
|
|
<ol>
|
|
<li>The <em>common</em> library provides mostly low-level utilities and
|
|
basic APIs that often do not make use of Locales. Examples are APIs
|
|
that deal with character properties, the Locale APIs themselves, and
|
|
ResourceBundle APIs.</li>
|
|
<li>The <em>i18n</em> library provides Locale-dependent and -using APIs,
|
|
like for collation and formatting, that are most useful for internationalized
|
|
user input and output.</li>
|
|
</ol>
|
|
<p>Put the source code files into the folder <code>icu/source/<i>library-name</i></code>.</p>
|
|
<p>Then add them to the build system:
|
|
<ul>
|
|
<li>
|
|
<p>For most platforms, add the expected .o files to <code>icu/source/<i>library-name</i>/Makefile.in</code>,
|
|
to the <code>OBJECTS</code> variable.</p>
|
|
<p>Add the <i>public</i> header files to the <code>HEADERS</code> variable.</p>
|
|
</li>
|
|
<li>For Microsoft Visual C++ 6.0, add all the source code files to <code>icu/source/<i>library-name</i>/<i>library-name</i>.dsp</code>.
|
|
If you don't have Visual C++, then try to add the filenames to the project
|
|
file manually; it is a text file, and this part should be fairly obvious.</li>
|
|
</ul>
|
|
<p></p>
|
|
<p>You also need to add test code to <code>icu/source/test/cintltest</code>
|
|
for C APIs and to <code>icu/source/test/intltest</code> for C++ APIs.</p>
|
|
<p>All the API functions must be called by the test code (100% API coverage),
|
|
and at least 85% of the implementation code should be exercised by the
|
|
tests (>=85% code coverage).</p>
|
|
<ol>
|
|
<li>For C, create test code using the <code>log_err()</code>, <code>log_info()</code>,
|
|
and <code>log_verbose()</code> APIs from <code>cintltst.h</code> (which
|
|
uses <code>ctest.h</code>), and check it into the appropriate folder.</li>
|
|
<li>In order to get your C test code called, you have to add its toplevel
|
|
function and a descriptive test module path to the test system by calling
|
|
<code>addTest()</code>. The function that makes the call to <code>addTest()</code>
|
|
ultimately has to be called by <code>addAllTests()</code> in <code>calltest.c</code>.
|
|
Groups of tests typically have a common <code>addGroup()</code> function
|
|
that calls <code>addTest()</code> for the test functions in its group,
|
|
according to the common part of the test module path.</li>
|
|
<li>Add that test code to the build system, too. Modify <code>Makefile.in</code>
|
|
and the appropriate <code>.dsp</code> file like for the library code.</li>
|
|
</ol>
|
|
<a name="tools"></a>
|
|
<h2 align="center">Build Tools</h2>
|
|
<p>We are using the following tools to build ICU:</p>
|
|
<ul>
|
|
<li>GNU make version 3.76.1 and up.</li>
|
|
<li>GNU cc version 2.95.2 and up. (egcs merged back into gcc.)</li>
|
|
<li>autoconf version 2.13 and up.</li>
|
|
<li>autoconf needs m4 version 1.4 and up.</li>
|
|
<li>Platform-specific compilers as listed with the supported platforms.<br>
|
|
(For a complete list of supported platforms, see the <a href="http://oss.software.ibm.com/icu/tech_faq.html">ICU
|
|
Technical FAQ</a>.)</li>
|
|
<li>On Windows: Microsoft Visual C++ 6.0 with the latest Service Packs.</li>
|
|
</ul>
|
|
</body>
|
|
</html>
|