Mention wxString caching in UTF-8 ode

git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@55344 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
This commit is contained in:
Robert Roebling 2008-08-29 12:46:41 +00:00
parent 3f5506cfd3
commit a6919a6aca
2 changed files with 33 additions and 19 deletions

View File

@ -232,11 +232,12 @@ internal representation and this implies that it can't guarantee constant-time
access to N-th element of the string any longer as to find the position of this access to N-th element of the string any longer as to find the position of this
character in the string we have to examine all the preceding ones. Usually this character in the string we have to examine all the preceding ones. Usually this
doesn't matter much because most algorithms used on the strings examine them doesn't matter much because most algorithms used on the strings examine them
sequentially anyhow, but it can have serious consequences for the algorithms sequentially anyhow and because wxString implements a cache for iterating over
using indexed access to string elements as they typically acquire O(N^2) time the string by index but it can have serious consequences for algorithms
using random access to string elements as they typically acquire O(N^2) time
complexity instead of O(N) where N is the length of the string. complexity instead of O(N) where N is the length of the string.
To return to the linear complexity, indexed access should be replaced with Even despite caching the index, indexed access should be replaced with
sequential access using string iterators. For example a typical loop: sequential access using string iterators. For example a typical loop:
@code @code
wxString s("hello"); wxString s("hello");

View File

@ -65,28 +65,41 @@ public:
/** /**
@class wxString @class wxString
The wxString class has been completely rewritten for wxWidgets 3.0
and this change was actually the main reason for the calling that
version wxWidgets 3.0.
wxString is a class representing a Unicode character string. wxString is a class representing a Unicode character string.
wxString uses @c std::string internally to store its content wxString uses @c std::string internally to store its content
unless this is not supported by the compiler or disabled unless this is not supported by the compiler or disabled
specifically when building wxWidgets. Therefore wxString specifically when building wxWidgets and it therefore inherits
inherits many features from @c std::string. Most many features from @c std::string. Most implementations of
implementations of @c std::string are thread-safe and don't @c std::string are thread-safe and don't use reference counting.
use reference counting. By default, wxString uses @c std::string By default, wxString uses @c std::string internally even if
internally even if wxUSE_STL is not defined. wxUSE_STL is not defined.
wxString now internally uses UTF-16 under Windows and UTF-8 under
Unix, Linux and OS X to store its content. Note that when iterating
over a UTF-16 string under Windows, the user code has to take care
of surrogate pair handling whereas Windows itself has built-in
support pairs in UTF-16, such as for drawing strings on screen.
Since wxWidgets 3.0 wxString internally uses UCS-2 (basically 2-byte per
character wchar_t and nearly the same as UTF-16) under Windows and
UTF-8 under Unix, Linux and OS X to store its content.
Much work has been done to make existing code using ANSI string literals Much work has been done to make existing code using ANSI string literals
work as before. If you need to have a wxString that uses wchar_t on Unix work as before. If you nonetheless need to have a wxString that uses wchar_t
and Linux, too, you can specify this on the command line with the on Unix and Linux, too, you can specify this on the command line with the
@c configure @c --disable-utf8 switch. @c configure @c --disable-utf8 switch or you can consider using wxUString
or std::wstring instead.
If you need a Unicode string class with O(1) access on all platforms Accessing a UTF-8 string by index can be very inefficient because
you should consider using wxUString. a single character is represented by a variable number of bytes so that
the entire string has to be parsed in order to find the character.
Since iterating over a string by index is a common programming technique and
was also possible and encouraged by wxString using the access operator[]()
wxString implements caching of the last used index so that iterating over
a string is a linear operation even in UTF-8 mode.
Since iterating over a wxString by index can become inefficient in UTF-8 It is nonetheless recommended to use iterators (instead of index bases
mode iterators should be used instead of index based access: access) like this:
@code @code
wxString s = "hello"; wxString s = "hello";