moved many things from wxString reference page to the wxString overview; updated some old/incoherent informations; added some DIA-drawn graphs showing UTF8/UCS2 different representation used by wxString
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@57140 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
This commit is contained in:
parent
e215c9959c
commit
727aa9062b
BIN
docs/doxygen/images/overview_unicode_codes.dia
Normal file
BIN
docs/doxygen/images/overview_unicode_codes.dia
Normal file
Binary file not shown.
Binary file not shown.
Before Width: | Height: | Size: 7.2 KiB After Width: | Height: | Size: 16 KiB |
BIN
docs/doxygen/images/overview_wxstring_encoding.dia
Normal file
BIN
docs/doxygen/images/overview_wxstring_encoding.dia
Normal file
Binary file not shown.
BIN
docs/doxygen/images/overview_wxstring_encoding.png
Normal file
BIN
docs/doxygen/images/overview_wxstring_encoding.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 63 KiB |
@ -51,6 +51,8 @@ unhindered through any traditional transport channels.
|
||||
|
||||
@section overview_mbconv_string Background: The wxString Class
|
||||
|
||||
@todo rewrite this overview; it's not up2date with wxString changes
|
||||
|
||||
If you have compiled wxWidgets in Unicode mode, the wxChar type will become
|
||||
identical to wchar_t rather than char, and a wxString stores wxChars. Hence,
|
||||
all wxString manipulation in your application will then operate on Unicode
|
||||
|
@ -13,10 +13,12 @@
|
||||
Classes: wxString, wxArrayString, wxStringTokenizer
|
||||
|
||||
@li @ref overview_string_intro
|
||||
@li @ref overview_string_internal
|
||||
@li @ref overview_string_comparison
|
||||
@li @ref overview_string_advice
|
||||
@li @ref overview_string_related
|
||||
@li @ref overview_string_tuning
|
||||
@li @ref overview_string_settings
|
||||
|
||||
|
||||
<hr>
|
||||
@ -24,25 +26,104 @@ Classes: wxString, wxArrayString, wxStringTokenizer
|
||||
|
||||
@section overview_string_intro Introduction
|
||||
|
||||
wxString is a class which represents a character string of arbitrary length and
|
||||
containing arbitrary characters. The ASCII NUL character is allowed, but be
|
||||
aware that in the current string implementation some methods might not work
|
||||
correctly in this case.
|
||||
wxString is a class which represents a Unicode string of arbitrary length and
|
||||
containing arbitrary characters.
|
||||
|
||||
Since wxWidgets 3.0 wxString internally uses UCS-2 (basically 2-byte per
|
||||
character wchar_t) under Windows and UTF-8 under Unix, Linux and
|
||||
OS X to store its content. Much work has been done to make
|
||||
existing code using ANSI string literals work as before.
|
||||
The @c NUL character is allowed, but be
|
||||
aware that in the current string implementation some methods might not work
|
||||
correctly in this case. @todo still true?
|
||||
|
||||
This class has all the standard operations you can expect to find in a string
|
||||
class: dynamic memory management (string extends to accommodate new
|
||||
characters), construction from other strings, C strings, wide character C strings
|
||||
characters), construction from other strings, C strings, wide character C strings
|
||||
and characters, assignment operators, access to individual characters, string
|
||||
concatenation and comparison, substring extraction, case conversion, trimming and padding (with
|
||||
spaces), searching and replacing and both C-like @c printf (wxString::Printf)
|
||||
concatenation and comparison, substring extraction, case conversion, trimming and
|
||||
padding (with spaces), searching and replacing and both C-like @c printf (wxString::Printf)
|
||||
and stream-like insertion functions as well as much more - see wxString for a
|
||||
list of all functions.
|
||||
|
||||
The wxString class has been completely rewritten for wxWidgets 3.0 but much work
|
||||
has been done to make existing code using ANSI string literals work as it did
|
||||
in previous versions.
|
||||
|
||||
|
||||
@section overview_string_internal Internal wxString encoding
|
||||
|
||||
Since wxWidgets 3.0 wxString internally uses <b>UCS-2</b> (with Unicode
|
||||
code units stored in @c wchar_t) under Windows and <b>UTF-8</b> (with Unicode
|
||||
code units stored in @c char) under Unix, Linux and Mac OS X to store its content.
|
||||
|
||||
For definitions of <em>code units</em> and <em>code points</em> terms, please
|
||||
see the @ref overview_unicode_encodings paragraph.
|
||||
|
||||
Note that there is a difference about UCS-2 and UTF-16: the first is a fixed-length
|
||||
encoding, without <em>surrogate pairs</em>, while the latter is a
|
||||
variable-length encoding. Except for this the two encodings are identical.
|
||||
|
||||
For simplicity of implementation, wxString when <tt>wxUSE_UNICODE_WCHAR==1</tt>
|
||||
(e.g. on Windows) uses UCS-2 and thus doesn't know anything about surrogate pairs;
|
||||
it always consider 1 code unit per 1 code point, while this is really true only for
|
||||
characters in the @e BMP (Basic Multilingual Plane).
|
||||
Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user
|
||||
code has to take care of <em>surrogate pair</em> handling himself.
|
||||
(Note however that Windows itself has built-in support for surrogate pairs in UTF-16,
|
||||
such as for drawing strings on screen.)
|
||||
|
||||
When instead <tt>wxUSE_UNICODE_UTF8==1</tt> (e.g. on Linux and Mac OS X)
|
||||
wxString handles UTF8 multi-bytes sequences just fine, so that you can use
|
||||
UTF8 in a completely transparent way:
|
||||
|
||||
Example:
|
||||
@code
|
||||
// first test, using exotic characters outside of the Unicode BMP:
|
||||
|
||||
wxString test = wxString::FromUTF8("\xF0\x90\x8C\x80");
|
||||
// U+10300 is "OLD ITALIC LETTER A" and is part of Unicode Plane 1
|
||||
// in UTF8 it's encoded as 0xF0 0x90 0x8C 0x80
|
||||
|
||||
// it's a single Unicode code-point encoded as:
|
||||
// - a UTF16 surrogate pair under Windows
|
||||
// - a UTF8 multiple-bytes sequence under Linux
|
||||
// (without considering the final NULL)
|
||||
|
||||
wxPrintf("wxString reports a length of %d character(s)", test.length());
|
||||
// prints "wxString reports a length of 1 character(s)" on Linux
|
||||
// prints "wxString reports a length of 2 character(s)" on Windows
|
||||
// since Windows doesn't have surrogate pairs support!
|
||||
|
||||
|
||||
// second test, this time using characters part of the Unicode BMP:
|
||||
|
||||
wxString test2 = wxString::FromUTF8("\x41\xC3\xA0\xE2\x82\xAC");
|
||||
// this is the UTF8 encoding of capital letter A followed by
|
||||
// 'small case letter a with grave' followed by the 'euro sign'
|
||||
|
||||
// they are 3 Unicode code-points encoded as:
|
||||
// - 3 UTF16 code units under Windows
|
||||
// - 6 UTF8 code units under Linux
|
||||
// (without considering the final NULL)
|
||||
|
||||
wxPrintf("wxString reports a length of %d character(s)", test2.length());
|
||||
// prints "wxString reports a length of 3 character(s)" on Linux
|
||||
// prints "wxString reports a length of 3 character(s)" on Windows
|
||||
@endcode
|
||||
|
||||
To better explain what stated above, consider the second string of the example
|
||||
above; it's composed by 3 characters and the final @c NULL:
|
||||
|
||||
@image html overview_wxstring_encoding.png
|
||||
|
||||
As you can see, UCS2/UTF16 encoding is straightforward (for characters in the @e BMP)
|
||||
and in this example the UCS2-encoded wxString takes 8 bytes.
|
||||
UTF8 encoding is more elaborated and in this example takes 7 bytes.
|
||||
|
||||
The type used by wxString to store Unicode code units is called wxStringCharType.
|
||||
|
||||
In general, for strings containing many latin characters UTF8 provides a big
|
||||
advantage in memory footprint respect UTF16, but requires some more processing
|
||||
for common operations like e.g. length calculation.
|
||||
|
||||
|
||||
|
||||
@section overview_string_comparison Comparison to Other String Classes
|
||||
|
||||
@ -50,52 +131,53 @@ The advantages of using a special string class instead of working directly with
|
||||
C strings are so obvious that there is a huge number of such classes available.
|
||||
The most important advantage is the need to always remember to allocate/free
|
||||
memory for C strings; working with fixed size buffers almost inevitably leads
|
||||
to buffer overflows. At last, C++ has a standard string class (std::string). So
|
||||
to buffer overflows. At last, C++ has a standard string class (@c std::string). So
|
||||
why the need for wxString? There are several advantages:
|
||||
|
||||
@li <b>Efficiency:</b> Since wxWidgets 3.0 wxString uses std::string (UTF8
|
||||
mode under Linux, Unix and OS X) or std::wstring (MSW) internally by
|
||||
default to store its constent. wxString will therefore inherit the
|
||||
performance characteristics from std::string.
|
||||
@li <b>Efficiency:</b> Since wxWidgets 3.0 wxString uses @c std::string (in UTF8
|
||||
mode under Linux, Unix and OS X) or @c std::wstring (in UTF16 mode under Windows)
|
||||
internally by default to store its contents. wxString will therefore inherit the
|
||||
performance characteristics from @c std::string.
|
||||
@li <b>Compatibility:</b> This class tries to combine almost full compatibility
|
||||
with the old wxWidgets 1.xx wxString class, some reminiscence to MFC
|
||||
CString class and 90% of the functionality of std::string class.
|
||||
@li <b>Rich set of functions:</b> Some of the functions present in wxString are very
|
||||
useful but don't exist in most of other string classes: for example,
|
||||
wxString::AfterFirst, wxString::BeforeLast, wxString::operators or
|
||||
wxString::Printf. Of course, all the standard string operations are
|
||||
supported as well.
|
||||
@li <b>Unicode wxString is Unicode friendly:</b> it allows to easily convert to
|
||||
and from ANSI and Unicode strings (see the @ref overview_unicode "unicode overview"
|
||||
for more details) and maps to @c wstring transparently.
|
||||
with the old wxWidgets 1.xx wxString class, some reminiscence of MFC's
|
||||
CString class and 90% of the functionality of @c std::string class.
|
||||
@li <b>Rich set of functions:</b> Some of the functions present in wxString are
|
||||
very useful but don't exist in most of other string classes: for example,
|
||||
wxString::AfterFirst, wxString::BeforeLast, wxString::Printf.
|
||||
Of course, all the standard string operations are supported as well.
|
||||
@li <b>wxString is Unicode friendly:</b> it allows to easily convert to
|
||||
and from ANSI and Unicode strings (see @ref overview_unicode
|
||||
for more details) and maps to @c std::wstring transparently.
|
||||
@li <b>Used by wxWidgets:</b> And, of course, this class is used everywhere
|
||||
inside wxWidgets so there is no performance loss which would result from
|
||||
conversions of objects of any other string class (including std::string) to
|
||||
conversions of objects of any other string class (including @c std::string) to
|
||||
wxString internally by wxWidgets.
|
||||
|
||||
However, there are several problems as well. The most important one is probably
|
||||
that there are often several functions to do exactly the same thing: for
|
||||
example, to get the length of the string either one of wxString::length(),
|
||||
wxString::Len() or wxString::Length() may be used. The first function, as
|
||||
almost all the other functions in lowercase, is std::string compatible. The
|
||||
almost all the other functions in lowercase, is @c std::string compatible. The
|
||||
second one is the "native" wxString version and the last one is the wxWidgets
|
||||
1.xx way.
|
||||
|
||||
So which is better to use? The usage of the std::string compatible functions is
|
||||
So which is better to use? The usage of the @c std::string compatible functions is
|
||||
strongly advised! It will both make your code more familiar to other C++
|
||||
programmers (who are supposed to have knowledge of std::string but not of
|
||||
programmers (who are supposed to have knowledge of @c std::string but not of
|
||||
wxString), let you reuse the same code in both wxWidgets and other programs (by
|
||||
just typedefing wxString as std::string when used outside wxWidgets) and by
|
||||
just typedefing wxString as @c std::string when used outside wxWidgets) and by
|
||||
staying compatible with future versions of wxWidgets which will probably start
|
||||
using std::string sooner or later too.
|
||||
using @c std::string sooner or later too.
|
||||
|
||||
In the situations where there is no corresponding std::string function, please
|
||||
In the situations where there is no corresponding @c std::string function, please
|
||||
try to use the new wxString methods and not the old wxWidgets 1.xx variants
|
||||
which are deprecated and may disappear in future versions.
|
||||
|
||||
|
||||
@section overview_string_advice Advice About Using wxString
|
||||
|
||||
@subsection overview_string_implicitconv Implicit conversions
|
||||
|
||||
Probably the main trap with using this class is the implicit conversion
|
||||
operator to <tt>const char*</tt>. It is advised that you use wxString::c_str()
|
||||
instead to clearly indicate when the conversion is done. Specifically, the
|
||||
@ -124,8 +206,8 @@ because the argument of @c puts() is known to be of the type
|
||||
<tt>const char*</tt>, this is @b not done for @c printf() which is a function
|
||||
with variable number of arguments (and whose arguments are of unknown types).
|
||||
So this call may do any number of things (including displaying the correct
|
||||
string on screen), although the most likely result is a program crash. The
|
||||
solution is to use wxString::c_str(). Just replace this line with this:
|
||||
string on screen), although the most likely result is a program crash.
|
||||
The solution is to use wxString::c_str(). Just replace this line with this:
|
||||
|
||||
@code
|
||||
printf("Hello, %s!\n", output.c_str());
|
||||
@ -138,10 +220,43 @@ its contents are completely arbitrary. The solution to this problem is also
|
||||
easy, just make the function return wxString instead of a C string.
|
||||
|
||||
This leads us to the following general advice: all functions taking string
|
||||
arguments should take <tt>const wxString</tt> (this makes assignment to the
|
||||
arguments should take <tt>const wxString&</tt> (this makes assignment to the
|
||||
strings inside the function faster) and all functions returning strings
|
||||
should return wxString - this makes it safe to return local variables.
|
||||
|
||||
Finally note that wxString uses the current locale encoding to convert any C string
|
||||
literal to Unicode. The same is done for converting to and from @c std::string
|
||||
and for the return value of c_str().
|
||||
For this conversion, the @a wxConvLibc class instance is used.
|
||||
See wxCSConv and wxMBConv.
|
||||
|
||||
|
||||
@subsection overview_string_iterating Iterating wxString's characters
|
||||
|
||||
As previously described, when <tt>wxUSE_UNICODE_UTF8==1</tt>, wxString internally
|
||||
uses the variable-length UTF8 encoding.
|
||||
Accessing a UTF-8 string by index can be very @b inefficient because
|
||||
a single character is represented by a variable number of bytes so that
|
||||
the entire string has to be parsed in order to find the character.
|
||||
Since iterating over a string by index is a common programming technique and
|
||||
was also possible and encouraged by wxString using the access operator[]()
|
||||
wxString implements caching of the last used index so that iterating over
|
||||
a string is a linear operation even in UTF-8 mode.
|
||||
|
||||
It is nonetheless recommended to use @b iterators (instead of index based
|
||||
access) like this:
|
||||
|
||||
@code
|
||||
wxString s = "hello";
|
||||
wxString::const_iterator i;
|
||||
for (i = s.begin(); i != s.end(); ++i)
|
||||
{
|
||||
wxUniChar uni_ch = *i;
|
||||
// do something with it
|
||||
}
|
||||
@endcode
|
||||
|
||||
|
||||
|
||||
@section overview_string_related String Related Functions and Classes
|
||||
|
||||
@ -158,7 +273,7 @@ these problems: wxIsEmpty() verifies whether the string is empty (returning
|
||||
case-insensitive string comparison function known either as @c stricmp() or
|
||||
@c strcasecmp() on different platforms.
|
||||
|
||||
The <tt>@<wx/string.h@></tt> header also defines wxSnprintf and wxVsnprintf
|
||||
The <tt>@<wx/string.h@></tt> header also defines ::wxSnprintf and ::wxVsnprintf
|
||||
functions which should be used instead of the inherently dangerous standard
|
||||
@c sprintf() and which use @c snprintf() instead which does buffer size checks
|
||||
whenever possible. Of course, you may also use wxString::Printf which is also
|
||||
@ -180,7 +295,7 @@ wxStrings.
|
||||
|
||||
@note This section is strictly about performance issues and is absolutely not
|
||||
necessary to read for using wxString class. Please skip it unless you feel
|
||||
familiar with profilers and relative tools.
|
||||
familiar with profilers and relative tools.
|
||||
|
||||
For the performance reasons wxString doesn't allocate exactly the amount of
|
||||
memory needed for each string. Instead, it adds a small amount of space to each
|
||||
@ -244,5 +359,16 @@ really consider fine tuning wxString for your application).
|
||||
It goes without saying that a profiler should be used to measure the precise
|
||||
difference the change to @c EXTRA_ALLOC makes to your program.
|
||||
|
||||
|
||||
@section overview_string_settings wxString Related Compilation Settings
|
||||
|
||||
Much work has been done to make existing code using ANSI string literals
|
||||
work as before version 3.0.
|
||||
If you nonetheless need to have a wxString that uses @c wchar_t
|
||||
on Unix and Linux, too, you can specify this on the command line with the
|
||||
@c configure @c --disable-utf8 switch or you can consider using wxUString
|
||||
or @c std::wstring instead.
|
||||
|
||||
|
||||
*/
|
||||
|
||||
|
@ -49,30 +49,34 @@ other services should be ready to deal with Unicode.
|
||||
|
||||
When working with Unicode, it's important to define the meaning of some terms.
|
||||
|
||||
A @e glyph is a particular image that represents a @e character or part of a character.
|
||||
A <b><em>glyph</em></b> is a particular image that represents a character or part
|
||||
of a character.
|
||||
Any character may have one or more glyph associated; e.g. some of the possible
|
||||
glyphs for the capital letter 'A' are:
|
||||
|
||||
@image html overview_unicode_glyphs.png
|
||||
|
||||
Unicode assigns each character of almost any existing alphabet/script a number,
|
||||
which is called <em>code point</em>; it's typically indicated in documentation
|
||||
which is called <b><em>code point</em></b>; it's typically indicated in documentation
|
||||
manuals and in the Unicode website as @c U+xxxx where @c xxxx is an hexadecimal number.
|
||||
|
||||
The Unicode standard divides the space of all possible code points in @e planes;
|
||||
a plane is a range of 65,536 (1000016) contiguous Unicode code points.
|
||||
Planes are numbered from 0 to 16, where the first one is the @e BMP, or Basic
|
||||
Multilingual Plane.
|
||||
The BMP contains characters for all modern languages, and a large number of
|
||||
special characters. The other planes in fact contain mainly historic scripts,
|
||||
special-purpose characters or are unused.
|
||||
|
||||
Code points are represented in computer memory as a sequence of one or more
|
||||
<em>code units</em>, where a code unit is a unit of memory: 8, 16, or 32 bits.
|
||||
<b><em>code units</em></b>, where a code unit is a unit of memory: 8, 16, or 32 bits.
|
||||
More precisely, a code unit is the minimal bit combination that can represent a
|
||||
unit of encoded text for processing or interchange.
|
||||
|
||||
The @e UTF or Unicode Transformation Formats are algorithms mapping the Unicode
|
||||
code points to code unit sequences. The simplest of them is <b>UTF-32</b> where
|
||||
each code unit is composed by 32 bits (4 bytes) and each code point is represented
|
||||
by a single code unit.
|
||||
each code unit is composed by 32 bits (4 bytes) and each code point is always
|
||||
represented by a single code unit (fixed length encoding).
|
||||
(Note that even UTF-32 is still not completely trivial as the mapping is different
|
||||
for little and big-endian architectures). UTF-32 is commonly used under Unix systems for
|
||||
internal representation of Unicode strings.
|
||||
@ -81,6 +85,7 @@ Another very widespread standard is <b>UTF-16</b> which is used by Microsoft Win
|
||||
it encodes the first (approximately) 64 thousands of Unicode code points
|
||||
(the BMP plane) using 16-bit code units (2 bytes) and uses a pair of 16-bit code
|
||||
units to encode the characters beyond this. These pairs are called @e surrogate.
|
||||
Thus UTF16 uses a variable number of code units to encode each code point.
|
||||
|
||||
Finally, the most widespread encoding used for the external Unicode storage
|
||||
(e.g. files and network protocols) is <b>UTF-8</b> which is byte-oriented and so
|
||||
@ -107,7 +112,7 @@ Typically when UTF8 is used, code units are stored into @c char types, since
|
||||
@c char are 8bit wide on almost all systems; when using UTF16 typically code
|
||||
units are stored into @c wchar_t types since @c wchar_t is at least 16bits on
|
||||
all systems. This is also the approach used by wxString.
|
||||
See @ref overview_wxstring for more info.
|
||||
See @ref overview_string for more info.
|
||||
|
||||
See also http://unicode.org/glossary/ for the official definitions of the
|
||||
terms reported above.
|
||||
@ -123,8 +128,8 @@ programs require the Microsoft Layer for Unicode to run on Windows 95/98/ME.
|
||||
|
||||
However, unlike the Unicode build mode of the previous versions of wxWidgets, this
|
||||
support is mostly transparent: you can still continue to work with the @b narrow
|
||||
(i.e. current-locale-encoded @c char*) strings even if @b wide
|
||||
(i.e. UTF16/UCS2-encoded @c wchar_t* or UTF8-encoded @c char) strings are also
|
||||
(i.e. current locale-encoded @c char*) strings even if @b wide
|
||||
(i.e. UTF16/UCS2-encoded @c wchar_t* or UTF8-encoded @c char*) strings are also
|
||||
supported. Any wxWidgets function accepts arguments of either type as both
|
||||
kinds of strings are implicitly converted to wxString, so both
|
||||
@code
|
||||
@ -132,7 +137,7 @@ wxMessageBox("Hello, world!");
|
||||
@endcode
|
||||
and the somewhat less usual
|
||||
@code
|
||||
wxMessageBox(L"Salut \u00e0 toi!"); // 00E0 is "Latin Small Letter a with Grave"
|
||||
wxMessageBox(L"Salut \u00E0 toi!"); // U+00E0 is "Latin Small Letter a with Grave"
|
||||
@endcode
|
||||
work as expected.
|
||||
|
||||
@ -147,9 +152,10 @@ in the case of gcc). In particular, the most common encoding used under
|
||||
modern Unix systems is UTF-8 and as the string above is not a valid UTF-8 byte
|
||||
sequence, nothing would be displayed at all in this case. Thus it is important
|
||||
to <b>never use 8-bit (instead of 7-bit) characters directly in the program source</b>
|
||||
but use wide strings or, alternatively, write
|
||||
but use wide strings or, alternatively, write:
|
||||
@code
|
||||
wxMessageBox(wxString::FromUTF8("Salut \xc3\xa0 toi!"));
|
||||
wxMessageBox(wxString::FromUTF8("Salut \xC3\xA0 toi!"));
|
||||
// in UTF8 the character U+00E0 is encoded as 0xC3A0
|
||||
@endcode
|
||||
|
||||
In a similar way, wxString provides access to its contents as either @c wchar_t or
|
||||
@ -327,6 +333,7 @@ different encoding of it. So you need to be able to convert the data to various
|
||||
representations and the wxString methods wxString::ToAscii(), wxString::ToUTF8()
|
||||
(or its synonym wxString::utf8_str()), wxString::mb_str(), wxString::c_str() and
|
||||
wxString::wc_str() can be used for this.
|
||||
|
||||
The first of them should be only used for the string containing 7-bit ASCII characters
|
||||
only, anything else will be replaced by some substitution character.
|
||||
wxString::mb_str() converts the string to the encoding used by the current locale
|
||||
|
@ -6,59 +6,6 @@
|
||||
// Licence: wxWindows license
|
||||
/////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
/**
|
||||
@class wxStringBuffer
|
||||
|
||||
This tiny class allows you to conveniently access the wxString internal buffer
|
||||
as a writable pointer without any risk of forgetting to restore the string
|
||||
to the usable state later.
|
||||
|
||||
For example, assuming you have a low-level OS function called
|
||||
@c "GetMeaningOfLifeAsString(char *)" returning the value in the provided
|
||||
buffer (which must be writable, of course) you might call it like this:
|
||||
|
||||
@code
|
||||
wxString theAnswer;
|
||||
GetMeaningOfLifeAsString(wxStringBuffer(theAnswer, 1024));
|
||||
if ( theAnswer != "42" )
|
||||
wxLogError("Something is very wrong!");
|
||||
@endcode
|
||||
|
||||
Note that the exact usage of this depends on whether or not wxUSE_STL is
|
||||
enabled. If wxUSE_STL is enabled, wxStringBuffer creates a separate empty
|
||||
character buffer, and if wxUSE_STL is disabled, it uses GetWriteBuf() from
|
||||
wxString, keeping the same buffer wxString uses intact. In other words,
|
||||
relying on wxStringBuffer containing the old wxString data is not a good
|
||||
idea if you want to build your program both with and without wxUSE_STL.
|
||||
|
||||
@library{wxbase}
|
||||
@category{data}
|
||||
*/
|
||||
class wxStringBuffer
|
||||
{
|
||||
public:
|
||||
/**
|
||||
Constructs a writable string buffer object associated with the given string
|
||||
and containing enough space for at least @a len characters.
|
||||
Basically, this is equivalent to calling wxString::GetWriteBuf() and
|
||||
saving the result.
|
||||
*/
|
||||
wxStringBuffer(const wxString& str, size_t len);
|
||||
|
||||
/**
|
||||
Restores the string passed to the constructor to the usable state by calling
|
||||
wxString::UngetWriteBuf() on it.
|
||||
*/
|
||||
~wxStringBuffer();
|
||||
|
||||
/**
|
||||
Returns the writable pointer to a buffer of the size at least equal to the
|
||||
length specified in the constructor.
|
||||
*/
|
||||
wxStringCharType* operator wxStringCharType *();
|
||||
};
|
||||
|
||||
|
||||
|
||||
/**
|
||||
@class wxString
|
||||
@ -68,66 +15,29 @@ public:
|
||||
version wxWidgets 3.0.
|
||||
|
||||
wxString is a class representing a Unicode character string.
|
||||
wxString uses @c std::string internally to store its content
|
||||
unless this is not supported by the compiler or disabled
|
||||
specifically when building wxWidgets and it therefore inherits
|
||||
many features from @c std::string. Most implementations of
|
||||
@c std::string are thread-safe and don't use reference counting.
|
||||
By default, wxString uses @c std::string internally even if
|
||||
wxUSE_STL is not defined.
|
||||
wxString uses @c std::basic_string internally (even if @c wxUSE_STL is not defined)
|
||||
to store its content (unless this is not supported by the compiler or disabled
|
||||
specifically when building wxWidgets) and it therefore inherits
|
||||
many features from @c std::basic_string. (Note that most implementations of
|
||||
@c std::basic_string are thread-safe and don't use reference counting.)
|
||||
|
||||
wxString now internally uses UTF-16 under Windows and UTF-8 under
|
||||
Unix, Linux and OS X to store its content. Note that when iterating
|
||||
over a UTF-16 string under Windows, the user code has to take care
|
||||
of surrogate pair handling whereas Windows itself has built-in
|
||||
support pairs in UTF-16, such as for drawing strings on screen.
|
||||
|
||||
Much work has been done to make existing code using ANSI string literals
|
||||
work as before. If you nonetheless need to have a wxString that uses wchar_t
|
||||
on Unix and Linux, too, you can specify this on the command line with the
|
||||
@c configure @c --disable-utf8 switch or you can consider using wxUString
|
||||
or std::wstring instead.
|
||||
|
||||
Accessing a UTF-8 string by index can be very inefficient because
|
||||
a single character is represented by a variable number of bytes so that
|
||||
the entire string has to be parsed in order to find the character.
|
||||
Since iterating over a string by index is a common programming technique and
|
||||
was also possible and encouraged by wxString using the access operator[]()
|
||||
wxString implements caching of the last used index so that iterating over
|
||||
a string is a linear operation even in UTF-8 mode.
|
||||
|
||||
It is nonetheless recommended to use iterators (instead of index based
|
||||
access) like this:
|
||||
|
||||
@code
|
||||
wxString s = "hello";
|
||||
wxString::const_iterator i;
|
||||
for (i = s.begin(); i != s.end(); ++i)
|
||||
{
|
||||
wxUniChar uni_ch = *i;
|
||||
// do something with it
|
||||
}
|
||||
@endcode
|
||||
|
||||
Please see the @ref overview_string and the @ref overview_unicode for more
|
||||
information about it.
|
||||
|
||||
wxString uses the current locale encoding to convert any C string
|
||||
literal to Unicode. The same is done for converting to and from
|
||||
@c std::string and for the return value of c_str().
|
||||
For this conversion, the @a wxConvLibc class instance is used.
|
||||
See wxCSConv and wxMBConv.
|
||||
|
||||
wxString implements most of the methods of the @c std::string class.
|
||||
These standard functions are only listed here, but they are not
|
||||
fully documented in this manual. Please see the STL documentation.
|
||||
These @c std::basic_string standard functions are only listed here, but
|
||||
they are not fully documented in this manual; see the STL documentation
|
||||
(http://www.cppreference.com/wiki/string/start) for more info.
|
||||
The behaviour of all these functions is identical to the behaviour
|
||||
described there.
|
||||
|
||||
You may notice that wxString sometimes has several functions which do
|
||||
the same thing like Length(), Len() and length() which
|
||||
all return the string length. In all cases of such duplication the
|
||||
@c std::string compatible method should be used.
|
||||
the same thing like Length(), Len() and length() which all return the
|
||||
string length. In all cases of such duplication the @c std::string
|
||||
compatible methods should be used.
|
||||
|
||||
For informations about the internal encoding used by wxString and
|
||||
for important warnings and advices for using it, please read
|
||||
the @ref overview_string.
|
||||
|
||||
In wxWidgets 3.0 wxString always stores Unicode strings, so you should
|
||||
be sure to read also @ref overview_unicode.
|
||||
|
||||
|
||||
@section string_construct Constructors and assignment operators
|
||||
@ -229,6 +139,7 @@ public:
|
||||
original string is not modified and the function returns the extracted
|
||||
substring.
|
||||
|
||||
@li at()
|
||||
@li substr()
|
||||
@li Mid()
|
||||
@li operator()()
|
||||
@ -1344,14 +1255,6 @@ public:
|
||||
STL reference for their documentation.
|
||||
*/
|
||||
//@{
|
||||
size_t length() const;
|
||||
size_type size() const;
|
||||
size_type max_size() const;
|
||||
size_type capacity() const;
|
||||
void reserve(size_t sz);
|
||||
|
||||
void resize(size_t nSize, wxUniChar ch = '\0');
|
||||
|
||||
wxString& append(const wxString& str, size_t pos, size_t n);
|
||||
wxString& append(const wxString& str);
|
||||
wxString& append(const char *sz, size_t n);
|
||||
@ -1366,8 +1269,13 @@ public:
|
||||
wxString& assign(size_t n, wxUniChar ch);
|
||||
wxString& assign(const_iterator first, const_iterator last);
|
||||
|
||||
wxUniChar at(size_t n) const;
|
||||
wxUniCharRef at(size_t n);
|
||||
|
||||
void clear();
|
||||
|
||||
size_type capacity() const;
|
||||
|
||||
int compare(const wxString& str) const;
|
||||
int compare(size_t nStart, size_t nLen, const wxString& str) const;
|
||||
int compare(size_t nStart, size_t nLen,
|
||||
@ -1377,6 +1285,8 @@ public:
|
||||
int compare(size_t nStart, size_t nLen,
|
||||
const wchar_t* sz, size_t nCount = npos) const;
|
||||
|
||||
wxCStrData data() const;
|
||||
|
||||
bool empty() const;
|
||||
|
||||
wxString& erase(size_type pos = 0, size_type n = npos);
|
||||
@ -1387,6 +1297,28 @@ public:
|
||||
size_t find(const char* sz, size_t nStart = 0, size_t n = npos) const;
|
||||
size_t find(const wchar_t* sz, size_t nStart = 0, size_t n = npos) const;
|
||||
size_t find(wxUniChar ch, size_t nStart = 0) const;
|
||||
size_t find_first_of(const char* sz, size_t nStart = 0) const;
|
||||
size_t find_first_of(const wchar_t* sz, size_t nStart = 0) const;
|
||||
size_t find_first_of(const char* sz, size_t nStart, size_t n) const;
|
||||
size_t find_first_of(const wchar_t* sz, size_t nStart, size_t n) const;
|
||||
size_t find_first_of(wxUniChar c, size_t nStart = 0) const
|
||||
size_t find_last_of (const wxString& str, size_t nStart = npos) const
|
||||
size_t find_last_of (const char* sz, size_t nStart = npos) const;
|
||||
size_t find_last_of (const wchar_t* sz, size_t nStart = npos) const;
|
||||
size_t find_last_of(const char* sz, size_t nStart, size_t n) const;
|
||||
size_t find_last_of(const wchar_t* sz, size_t nStart, size_t n) const;
|
||||
size_t find_last_of(wxUniChar c, size_t nStart = npos) const
|
||||
size_t find_first_not_of(const wxString& str, size_t nStart = 0) const
|
||||
size_t find_first_not_of(const char* sz, size_t nStart = 0) const;
|
||||
size_t find_first_not_of(const wchar_t* sz, size_t nStart = 0) const;
|
||||
size_t find_first_not_of(const char* sz, size_t nStart, size_t n) const;
|
||||
size_t find_first_not_of(const wchar_t* sz, size_t nStart, size_t n) const;
|
||||
size_t find_first_not_of(wxUniChar ch, size_t nStart = 0) const;
|
||||
size_t find_last_not_of(const wxString& str, size_t nStart = npos) const
|
||||
size_t find_last_not_of(const char* sz, size_t nStart = npos) const;
|
||||
size_t find_last_not_of(const wchar_t* sz, size_t nStart = npos) const;
|
||||
size_t find_last_not_of(const char* sz, size_t nStart, size_t n) const;
|
||||
size_t find_last_not_of(const wchar_t* sz, size_t nStart, size_t n) const;
|
||||
|
||||
wxString& insert(size_t nPos, const wxString& str);
|
||||
wxString& insert(size_t nPos, const wxString& str, size_t nStart, size_t n);
|
||||
@ -1397,6 +1329,13 @@ public:
|
||||
void insert(iterator it, const_iterator first, const_iterator last);
|
||||
void insert(iterator it, size_type n, wxUniChar ch);
|
||||
|
||||
size_t length() const;
|
||||
|
||||
size_type max_size() const;
|
||||
|
||||
void reserve(size_t sz);
|
||||
void resize(size_t nSize, wxUniChar ch = '\0');
|
||||
|
||||
wxString& replace(size_t nStart, size_t nLen, const wxString& str);
|
||||
wxString& replace(size_t nStart, size_t nLen, size_t nCount, wxUniChar ch);
|
||||
wxString& replace(size_t nStart, size_t nLen,
|
||||
@ -1423,12 +1362,10 @@ public:
|
||||
size_t rfind(const wchar_t* sz, size_t nStart = npos, size_t n = npos) const;
|
||||
size_t rfind(wxUniChar ch, size_t nStart = npos) const;
|
||||
|
||||
size_type size() const;
|
||||
wxString substr(size_t nStart = 0, size_t nLen = npos) const;
|
||||
|
||||
void swap(wxString& str);
|
||||
|
||||
//@}
|
||||
|
||||
};
|
||||
|
||||
/**
|
||||
@ -1510,3 +1447,55 @@ public:
|
||||
wxChar* operator wxChar *();
|
||||
};
|
||||
|
||||
|
||||
/**
|
||||
@class wxStringBuffer
|
||||
|
||||
This tiny class allows you to conveniently access the wxString internal buffer
|
||||
as a writable pointer without any risk of forgetting to restore the string
|
||||
to the usable state later.
|
||||
|
||||
For example, assuming you have a low-level OS function called
|
||||
@c "GetMeaningOfLifeAsString(char *)" returning the value in the provided
|
||||
buffer (which must be writable, of course) you might call it like this:
|
||||
|
||||
@code
|
||||
wxString theAnswer;
|
||||
GetMeaningOfLifeAsString(wxStringBuffer(theAnswer, 1024));
|
||||
if ( theAnswer != "42" )
|
||||
wxLogError("Something is very wrong!");
|
||||
@endcode
|
||||
|
||||
Note that the exact usage of this depends on whether or not @c wxUSE_STL is
|
||||
enabled. If @c wxUSE_STL is enabled, wxStringBuffer creates a separate empty
|
||||
character buffer, and if @c wxUSE_STL is disabled, it uses GetWriteBuf() from
|
||||
wxString, keeping the same buffer wxString uses intact. In other words,
|
||||
relying on wxStringBuffer containing the old wxString data is not a good
|
||||
idea if you want to build your program both with and without @c wxUSE_STL.
|
||||
|
||||
@library{wxbase}
|
||||
@category{data}
|
||||
*/
|
||||
class wxStringBuffer
|
||||
{
|
||||
public:
|
||||
/**
|
||||
Constructs a writable string buffer object associated with the given string
|
||||
and containing enough space for at least @a len characters.
|
||||
Basically, this is equivalent to calling wxString::GetWriteBuf() and
|
||||
saving the result.
|
||||
*/
|
||||
wxStringBuffer(const wxString& str, size_t len);
|
||||
|
||||
/**
|
||||
Restores the string passed to the constructor to the usable state by calling
|
||||
wxString::UngetWriteBuf() on it.
|
||||
*/
|
||||
~wxStringBuffer();
|
||||
|
||||
/**
|
||||
Returns the writable pointer to a buffer of the size at least equal to the
|
||||
length specified in the constructor.
|
||||
*/
|
||||
wxStringCharType* operator wxStringCharType *();
|
||||
};
|
||||
|
Loading…
Reference in New Issue
Block a user