May 04: Parsing the XML Stream

Parsing the XML Stream

The XML parser can parse simple XML files and is not meant to deal with all the advanced features of XML (attributes, namespaces, and so on). It is built to parse files similar to Example 1, but its main advantage is that the code is platform independent, based entirely on STL, and quite fast. The entire handling of the XML stream is done with two classes — Parser and Observer. The parsing algorithm used by the Parser is:

Find "<" in the stream text.
Find the following ">" in the stream text.
Extract name and attributes between the found "<" and ">".
Discard tags in the form of <!-- comment> or <?xml some-text>; go to Step 1.
Find the closing tag "</name>"
Check if there is a tag between "<name attribute>" and "</name>". If there is a tag, then we found a node, so the parser sends a FoundNode event to the Observer, parses the text in between the node tags, and eventually sends an EndNode event to the Observer. If not, we found an element, and the parser sends a FoundNode event to the observer.
Go on parsing to Step 1 until the end of the stream.

This heuristic can fail for badly generated XML files, but works well with files generated by ParamIO (the goal is not to parse all the possible XML files). The Observer builds a representation of the XML stream in memory using the events sent by the Parser. The natural representation used is a tree. The tree is made of two kinds of objects: Elements that have a name, value, and attributes (stored as strings); and nodes that may have subnodes, elements, and an iterator pointing to their parent.

The tree is built using the events sent by the Parser. The Observer keeps a pointer to the current tree node and adds a node or an element according to the current event (FoundNode or FoundElement). In case of an EndNode event, it moves the pointer to the parent of the current node. Once the tree is built, the methods read and write directly access and/or modify the tree.

— A.B. and A.S.

Back to Article