wxWidgets/utils/tex2rtf/docs/notes.txt

141 lines
4.4 KiB
Plaintext
Raw Normal View History

Implementation notes
--------------------
Files
-----
The library tex2any.lib contains the generic Latex parser.
It comprises tex2any.cc, tex2any.h and texutils.cc.
The executable Tex2RTF is made up of tex2any.lib,
tex2rtf.cc (main driver and user interface), and specific
drivers for generating output: rtfutils.cc, htmlutil.cc
and xlputils.cc.
Data structures
---------------
Class declarations are found in tex2any.h.
TexMacroDef holds a macro (Latex command) definition: name, identifier,
number of arguments, whether it should be ignored, etc. Integer
identifiers are used for each Latex command for efficiency when
generating output. A hash table MacroDefs stores all the TexMacroDefs,
indexed on command name.
Each unit of a Latex file is stored in a TexChunk. A TexChunk can be
a macro, argument or just a string: a TexChunk macro has child
chunks for the arguments, and each argument will have one or more
children for representing another command or a simple string.
Parsing
-------
Parsing is relatively add hoc. read_a_line reads in a line at a time,
doing some processing for file commands (e.g. input, verbatiminclude).
File handles are stored in a stack so file input commands may be nested.
ParseArg parses an argument (which might be the whole Latex input,
which is treated as an argument) or a single command, or a command
argument. The parsing gets a little hairy because an environment,
a normal command and bracketed commands (e.g. {\bf thing}) all get
parsed into the same format. An environment, for example,
is usually a one-argument command, as is {\bf thing}. It also
deals with user-defined macros.
Whilst parsing, the function MatchMacro gets called to
attempt to find a command following a backslash (or the
start of an environment). ParseMacroBody parses the
arguments of a command when one is found.
Generation
----------
The upshot of parsing is a hierarchy of TexChunks.
TraverseFromDocument calls the recursive TraverseFromChunk,
and is called by the 'client' converter application to
start the generation process. TraverseFromChunk
calls the two functions OnMacro and OnArgument,
twice for each chunk to allow for preprocessing
and postprocessing of each macro or argument.
The client defines OnMacro and OnArgument to test
the command identifier, and output the appropriate
code. To help do this, the function TexOutput
outputs to the current stream(s), and
SetCurrentOutput(s) allows the setting of one
or two output streams for the output to be sent to.
Usually two outputs at a time are sufficient for
hypertext applications where a title is likely
to appear in an index and as a section header.
There are support functions for getting the string
data for the current chunk (GetArgData) and the
current chunk (GetArgChunk). If you have a handle
on a chunk, you can output it several times by calling
TraverseChildrenFromChunk (not TraverseFromChunk because
that causes infinite recursion).
The client (here, Tex2RTF) also defines OnError and OnInform output
functions appropriate to the desired user interface.
References
----------
Adding, finding and resolving references are supported
with functions from texutils.cc. WriteTexReferences
and ReadTexReferences allow saving and reading references
between conversion processes, rather like real LaTeX.
Bibliography
------------
Again texutils.cc provides functions for reading in .bib files and
resolving references. The function OutputBibItem gives a generic way
outputting bibliography items, by 'faking' calls to OnMacro and
OnArgument, allowing the existing low-level client code to take care of
formatting.
Units
-----
Unit parsing code is in texutils.cc as ParseUnitArgument. It converts
units to points.
Common errors
-------------
1) Macro not found: \end{center} ...
Rewrite:
\begin{center}
{\large{\underline{A}}}
\end{center}
as:
\begin{center}
{\large \underline{A}}
\end{center}
2) Tables crash RTF. Set 'compatibility ' to TRUE in .ini file; also
check for \\ end of row characters on their own on a line, insert
correct number of ampersands for the number of columns. E.g.
hello & world\\
\\
becomes
hello & world\\
&\\
3) If list items indent erratically, try increasing
listItemIndent to give more space between label and following text.
A global replace of '\item [' to '\item[' may also be helpful to remove
unnecessary space before the item label.
4) Missing figure or section references: ensure all labels _directly_ follow captions
or sections (no intervening white space).