1999-07-10 21:17:24 +00:00
|
|
|
%
|
|
|
|
% automatically generated by HelpGen from
|
|
|
|
% htmlparser.tex at 14/Mar/99 20:13:37
|
|
|
|
%
|
|
|
|
|
|
|
|
\section{\class{wxHtmlParser}}\label{wxhtmlparser}
|
|
|
|
|
1999-08-05 22:05:15 +00:00
|
|
|
This class handles the {\bf generic} parsing of HTML document: it scans
|
1999-07-10 21:17:24 +00:00
|
|
|
the document and divide it into blocks of tags (where one block
|
2002-06-07 20:15:28 +00:00
|
|
|
consists of beginning and ending tag and of text between these
|
1999-08-05 22:05:15 +00:00
|
|
|
two tags).
|
1999-07-10 21:17:24 +00:00
|
|
|
|
|
|
|
It is independent from wxHtmlWindow and can be used as stand-alone parser
|
|
|
|
(Julian Smart's idea of speech-only HTML viewer or wget-like utility -
|
1999-07-28 05:52:04 +00:00
|
|
|
see InetGet sample for example).
|
1999-07-10 21:17:24 +00:00
|
|
|
|
|
|
|
It uses system of tag handlers to parse the HTML document. Tag handlers
|
2002-06-07 20:15:28 +00:00
|
|
|
are not statically shared by all instances but are created for each
|
1999-07-10 21:17:24 +00:00
|
|
|
wxHtmlParser instance. The reason is that the handler may contain
|
|
|
|
document-specific temporary data used during parsing (e.g. complicated
|
1999-08-05 22:05:15 +00:00
|
|
|
structures like tables).
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-08-05 22:05:15 +00:00
|
|
|
Typically the user calls only the \helpref{Parse}{wxhtmlparserparse} method.
|
1999-07-10 21:17:24 +00:00
|
|
|
|
|
|
|
\wxheading{Derived from}
|
|
|
|
|
|
|
|
wxObject
|
|
|
|
|
2000-02-27 21:06:58 +00:00
|
|
|
\wxheading{Include files}
|
|
|
|
|
|
|
|
<wx/html/htmlpars.h>
|
|
|
|
|
1999-07-10 21:17:24 +00:00
|
|
|
\wxheading{See also}
|
|
|
|
|
|
|
|
\helpref{Cells Overview}{cells},
|
|
|
|
\helpref{Tag Handlers Overview}{handlers},
|
|
|
|
\helpref{wxHtmlTag}{wxhtmltag}
|
|
|
|
|
|
|
|
\latexignore{\rtfignore{\wxheading{Members}}}
|
|
|
|
|
|
|
|
\membersection{wxHtmlParser::wxHtmlParser}\label{wxhtmlparserwxhtmlparser}
|
|
|
|
|
|
|
|
\func{}{wxHtmlParser}{\void}
|
|
|
|
|
1999-07-28 05:52:04 +00:00
|
|
|
Constructor.
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::AddTag}\label{wxhtmlparseraddtag}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\func{void}{AddTag}{\param{const wxHtmlTag\& }{tag}}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
2002-06-07 20:15:28 +00:00
|
|
|
This may (and may not) be overwritten in derived class.
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
This method is called each time new tag is about to be added.
|
|
|
|
{\it tag} contains information about the tag. (See \helpref{wxHtmlTag}{wxhtmltag}
|
|
|
|
for details.)
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
Default (wxHtmlParser) behaviour is this:
|
|
|
|
First it finds a handler capable of handling this tag and then it calls
|
|
|
|
handler's HandleTag method.
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::AddTagHandler}\label{wxhtmlparseraddtaghandler}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\func{virtual void}{AddTagHandler}{\param{wxHtmlTagHandler }{*handler}}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
Adds handler to the internal list (\& hash table) of handlers. This
|
|
|
|
method should not be called directly by user but rather by derived class'
|
|
|
|
constructor.
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
This adds the handler to this {\bf instance} of wxHtmlParser, not to
|
|
|
|
all objects of this class! (Static front-end to AddTagHandler is provided
|
|
|
|
by wxHtmlWinParser).
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
All handlers are deleted on object deletion.
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::AddText}\label{wxhtmlparseraddword}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\func{virtual void}{AddWord}{\param{const char* }{txt}}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
2002-06-07 20:15:28 +00:00
|
|
|
Must be overwritten in derived class.
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
This method is called by \helpref{DoParsing}{wxhtmlparserdoparsing}
|
|
|
|
each time a part of text is parsed. {\it txt} is NOT only one word, it is
|
|
|
|
substring of input. It is not formatted or preprocessed (so white spaces are
|
|
|
|
unmodified).
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::DoParsing}\label{wxhtmlparserdoparsing}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\func{void}{DoParsing}{\param{int }{begin\_pos}, \param{int }{end\_pos}}
|
|
|
|
|
|
|
|
\func{void}{DoParsing}{\void}
|
|
|
|
|
|
|
|
Parses the m\_Source from begin\_pos to end\_pos-1.
|
|
|
|
(in noparams version it parses whole m\_Source)
|
1999-07-10 21:17:24 +00:00
|
|
|
|
|
|
|
\membersection{wxHtmlParser::DoneParser}\label{wxhtmlparserdoneparser}
|
|
|
|
|
|
|
|
\func{virtual void}{DoneParser}{\void}
|
|
|
|
|
|
|
|
This must be called after DoParsing().
|
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::GetFS}\label{wxhtmlparsergetfs}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\constfunc{wxFileSystem*}{GetFS}{\void}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
Returns pointer to the file system. Because each tag handler has
|
2000-07-15 19:51:35 +00:00
|
|
|
reference to it is parent parser it can easily request the file by
|
1999-10-16 15:38:26 +00:00
|
|
|
calling
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\begin{verbatim}
|
|
|
|
wxFSFile *f = m_Parser -> GetFS() -> OpenFile("image.jpg");
|
|
|
|
\end{verbatim}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
|
|
|
\membersection{wxHtmlParser::GetProduct}\label{wxhtmlparsergetproduct}
|
|
|
|
|
|
|
|
\func{virtual wxObject*}{GetProduct}{\void}
|
|
|
|
|
1999-07-28 05:52:04 +00:00
|
|
|
Returns product of parsing. Returned value is result of parsing
|
|
|
|
of the document. The type of this result depends on internal
|
1999-07-10 21:17:24 +00:00
|
|
|
representation in derived parser (but it must be derived from wxObject!).
|
|
|
|
|
|
|
|
See wxHtmlWinParser for details.
|
|
|
|
|
|
|
|
\membersection{wxHtmlParser::GetSource}\label{wxhtmlparsergetsource}
|
|
|
|
|
|
|
|
\func{wxString*}{GetSource}{\void}
|
|
|
|
|
|
|
|
Returns pointer to the source being parsed.
|
|
|
|
|
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::InitParser}\label{wxhtmlparserinitparser}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\func{virtual void}{InitParser}{\param{const wxString\& }{source}}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
2000-07-15 19:51:35 +00:00
|
|
|
Setups the parser for parsing the {\it source} string. (Should be overridden
|
1999-10-16 15:38:26 +00:00
|
|
|
in derived class)
|
1999-07-10 21:17:24 +00:00
|
|
|
|
2002-01-27 19:03:10 +00:00
|
|
|
\membersection{wxHtmlParser::OpenURL}\label{wxhtmlparseropenurl}
|
|
|
|
|
|
|
|
\func{virtual wxFSFile*}{OpenURL}{\param{wxHtmlURLType }{type}, \param{const wxString\& }{url}}
|
|
|
|
|
|
|
|
Opens given URL and returns {\tt wxFSFile} object that can be used to read data
|
|
|
|
from it. This method may return NULL in one of two cases: either the URL doesn't
|
2002-06-07 20:15:28 +00:00
|
|
|
point to any valid resource or the URL is blocked by overridden implementation
|
2002-01-27 19:03:10 +00:00
|
|
|
of {\it OpenURL} in derived class.
|
|
|
|
|
|
|
|
\wxheading{Parameters}
|
|
|
|
|
|
|
|
\docparam{type}{Indicates type of the resource. Is one of
|
|
|
|
\begin{twocollist}\itemsep=0pt
|
|
|
|
\twocolitem{{\bf wxHTML\_URL\_PAGE}}{Opening a HTML page.}
|
|
|
|
\twocolitem{{\bf wxHTML\_URL\_IMAGE}}{Opening an image.}
|
|
|
|
\twocolitem{{\bf wxHTML\_URL\_OTHER}}{Opening a resource that doesn't fall into
|
|
|
|
any other category.}
|
|
|
|
\end{twocollist}}
|
|
|
|
|
|
|
|
\docparam{url}{URL being opened.}
|
|
|
|
|
|
|
|
\wxheading{Notes}
|
|
|
|
|
|
|
|
Always use this method in tag handlers instead of {\tt GetFS()->OpenFile()}
|
|
|
|
because it can block the URL and is thus more secure.
|
|
|
|
|
|
|
|
Default behaviour is to call \helpref{wxHtmlWindow::OnOpeningURL}{wxhtmlwindowonopeningurl}
|
|
|
|
of the associated wxHtmlWindow object (which may decide to block the URL or
|
|
|
|
redirect it to another one),if there's any, and always open the URL if the
|
|
|
|
parser is not used with wxHtmlWindow.
|
|
|
|
|
|
|
|
Returned {\tt wxFSFile} object is not guaranteed to point to {\it url}, it might
|
|
|
|
have been redirected!
|
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::Parse}\label{wxhtmlparserparse}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\func{wxObject*}{Parse}{\param{const wxString\& }{source}}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
Proceeds parsing of the document. This is end-user method. You can simply
|
|
|
|
call it when you need to obtain parsed output (which is parser-specific)
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
The method does these things:
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-25 13:46:48 +00:00
|
|
|
\begin{enumerate}\itemsep=0pt
|
1999-10-16 15:38:26 +00:00
|
|
|
\item calls \helpref{InitParser(source)}{wxhtmlparserinitparser}
|
|
|
|
\item calls \helpref{DoParsing}{wxhtmlparserdoparsing}
|
|
|
|
\item calls \helpref{GetProduct}{wxhtmlparsergetproduct}
|
|
|
|
\item calls \helpref{DoneParser}{wxhtmlparserdoneparser}
|
|
|
|
\item returns value returned by GetProduct
|
|
|
|
\end{enumerate}
|
1999-07-10 21:17:24 +00:00
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
You shouldn't use InitParser, DoParsing, GetProduct or DoneParser directly.
|
|
|
|
|
1999-10-27 23:31:13 +00:00
|
|
|
|
|
|
|
|
|
|
|
\membersection{wxHtmlParser::PushTagHandler}\label{wxhtmlparserpushtaghandler}
|
|
|
|
|
|
|
|
\func{void}{PushTagHandler}{\param{wxHtmlTagHandler* }{handler}, \param{wxString }{tags}}
|
|
|
|
|
|
|
|
Forces the handler to handle additional tags
|
|
|
|
(not returned by \helpref{GetSupportedTags}{wxhtmltaghandlergetsupportedtags}).
|
|
|
|
The handler should already be added to this parser.
|
|
|
|
|
|
|
|
\wxheading{Parameters}
|
|
|
|
|
|
|
|
\docparam{handler}{the handler}
|
|
|
|
\docparam{tags}{List of tags (in same format as GetSupportedTags's return value). The parser
|
|
|
|
will redirect these tags to {\it handler} (until call to \helpref{PopTagHandler}{wxhtmlparserpoptaghandler}). }
|
|
|
|
|
|
|
|
\wxheading{Example}
|
|
|
|
|
|
|
|
Imagine you want to parse following pseudo-html structure:
|
|
|
|
|
|
|
|
\begin{verbatim}
|
|
|
|
<myitems>
|
|
|
|
<param name="one" value="1">
|
|
|
|
<param name="two" value="2">
|
|
|
|
</myitems>
|
|
|
|
|
|
|
|
<execute>
|
|
|
|
<param program="text.exe">
|
|
|
|
</execute>
|
|
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
It is obvious that you cannot use only one tag handler for <param> tag.
|
|
|
|
Instead you must use context-sensitive handlers for <param> inside <myitems>
|
|
|
|
and <param> inside <execute>.
|
|
|
|
|
2000-07-15 19:51:35 +00:00
|
|
|
This is the preferred solution:
|
1999-10-27 23:31:13 +00:00
|
|
|
|
|
|
|
\begin{verbatim}
|
|
|
|
TAG_HANDLER_BEGIN(MYITEM, "MYITEMS")
|
|
|
|
TAG_HANDLER_PROC(tag)
|
|
|
|
{
|
|
|
|
// ...something...
|
|
|
|
|
|
|
|
m_Parser -> PushTagHandler(this, "PARAM");
|
|
|
|
ParseInner(tag);
|
|
|
|
m_Parser -> PopTagHandler();
|
|
|
|
|
|
|
|
// ...something...
|
|
|
|
}
|
|
|
|
TAG_HANDLER_END(MYITEM)
|
|
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
|
|
|
|
\membersection{wxHtmlParser::PopTagHandler}\label{wxhtmlparserpoptaghandler}
|
|
|
|
|
|
|
|
\func{void}{PopTagHandler}{\void}
|
|
|
|
|
|
|
|
Restores parser's state before last call to
|
|
|
|
\helpref{PushTagHandler}{wxhtmlparserpushtaghandler}.
|
|
|
|
|
|
|
|
|
1999-10-16 15:38:26 +00:00
|
|
|
\membersection{wxHtmlParser::SetFS}\label{wxhtmlparsersetfs}
|
|
|
|
|
|
|
|
\func{void}{SetFS}{\param{wxFileSystem }{*fs}}
|
|
|
|
|
|
|
|
Sets the virtual file system that will be used to request additional
|
|
|
|
files. (For example {\tt <IMG>} tag handler requests wxFSFile with the
|
|
|
|
image data.)
|
1999-08-05 22:05:15 +00:00
|
|
|
|