XML-based Factories and Reusable Components

Gualtiero Chiaia


In the real world, a factory is a physical entity specializing in the automated production of goods, all having well specified features. Most often, the production process consists of the assembling of third-party sub-components, also having well specified features. Typically, a production process is repeated many times, resulting in a collection of (almost) identical items. These items are stored in warehouses and labelled with a unique tag before reaching the retail chains. To run this factory successfully, the CEO needs to be fully aware of the production process, the complete sequence of steps required, and the commodities used during production.

In the OO (object-oriented) world, a factory is an object of a class specializing in the automated construction of objects of a different class. Most often, the construction of complex objects consists of assembling library components (objects of other classes, either belonging to the library or to third-party libraries). A class factory can construct many objects. These objects can be stored in persistent containers, in order to be used by different clients, or reused several times. The objects are associated with unique identifiers, in order to be identified and easily retrieved. The information required by the factory’s construction procedure (the build member function) can be passed to the factory object in many ways. An efficient way to do so is by structured XML files or strings.

To extend this analogy even further, traditional factories could not exist without an associated warehouse and a distribution system, both because the customers’ demand is not known with precision beforehand, and because of economy of scale considerations. Besides, the economical separation between production and distribution increases the system efficiency due to the beneficial concurrency among different providers.

Similarly in the OO world, when a class factory is required to automate objects’ construction, a persistent container for the factory end products is invariably needed. First because, as in the real world case, the demand for the binary components by different clients, or by the same client at different times, is likely to be unknown. Second because complex software systems typically deal with an object with a life span extending beyond the destruction of the factory object itself. In fact, the class factory is typically set into action by a client interface, but the objects constructed by the factory should outlive the client interface itself. So object persistence and automated construction by factories often go hand in hand.

This article describes a typical way of using factory objects to extract information from an XML source and to construct persistent objects for a specific task (to price financial contracts in this case). Figure 1 shows a sketch of the project.

All the information describing the financial contract is contained in an XML document. This is the input for a specialized factory object (i.e., a class object that inherits the interface of an abstract class factory). The role of this factory object is to build different sub-components required to price the contract (e.g., a time-line containing all the important dates of the contract, a pricing engine object, a discount factor curve, and others). Most of these components can and should be reused by other contracts, for efficiency considerations. Therefore the factory should also be responsible for storing these components in a container object, having a life span independent from the life span of the factory itself.

Not all software architectures require class factories, but the advantages of this approach are often appealing. In my opinion, this is true for those architectures where specialization (OO inheritance) and aggregation (containment through member pointers to other library components) are used in conjunction and are well balanced. In these situations, the object library is typically organized in several shallow class hierarchies, with constructors of one class receiving pointers to abstract base classes at the root of a different hierarchy. This can result in complicated chain dependencies in the construction process, where the constructor of a complex component requires data and pointers to simpler components, which also require data and pointers to more primitive components, and so on. Here the automation of the construction process by means of class factories offers the following advantages:

In addition, the combination of XML input data and class factories offers the advantage of structuring the raw data consumed by the factory in a rational and standard way. In fact the natural structure of the input data, used to initialize complex objects, can be easily mapped to XML tree structures, and the extensibility of the XML protocol allows reuse of XML protocols already established in the market. In finance, for example, two XML protocols for describing financial contracts are emerging: FpML [1] and FIXML [2].

Program Flow

In the project described here, a persistent container object is first created when the library is loaded by the main application. When all the information describing the financial contract is gathered by the GUI application (Java front end), the XML document is formed and used to construct the specific factory object. The factory interrogates the container for reusable components and stores new components in it. The financial instrument is then priced and the factory object is eventually discarded. The reusable components however outlive the client and are finally deleted when the main application (the server) ends.

XML and DOM Parsers

XML is a powerful protocol for transferring data. In the financial library that I have designed and developed, different Java client applications are used to form valid XML files or strings, containing the typically large set of data that describe a financial contract. This XML information is sent through a JNI (Java Native Interface) to the C++ layer, where it is parsed back into C++ objects by means of a specialized factory object.

Although this flow of data can occur on a single PC, the separation between the user interface (a Java GUI in this case) and the C++ library favors a client/server architecture where the complex numerical evaluation of the financial contract is performed on the server. XML helps greatly in achieving this logical and physical separation. For this purpose, an efficient XML parser is required to retrieve the information. In the code presented here, I use the API provided by the Xerces parser, developed by the Apache XML project and freely available from the Internet [3]. Many other open source parsers are freely available, but a comparison among them is beyond the scope of this article.

Most XML parsers implement two interfaces: the SAX (Simple API for XML) interface, and the DOM (Document Object Model) interface, which builds on top of the SAX interface. SAX parsers are event driven and access the XML data serially. The implementation of a SAX parser should register an event handler and define callback functions, called whenever an XML tag is opened or closed during the sequential reading of the input. This parsing mechanism has speed on its side, but is memory-less and does not allow editing of the in-memory representation of the XML data. On the other hand, DOM parsers create a tree-like representation of the source file in memory, transforming the XML document into a hierarchy of objects. This provides random access, editing, and insertion and removal of data from the tree, but it is obviously less efficient than in the SAX case, because it involves a two-step process: population of the tree by a SAX-like parsing, and tree-walking to edit and extract the data. However for small- to medium-size XML documents, the OO advantages of DOM over SAX are winning over this inefficiency consideration. A Java tutorial web page [4] provides a thorough analysis and comparison between SAX and DOM parsers. The DOM API is a W3C specification [5] and is language-independent (defined using the OMG IDL).

The class factory WU_XMLFactory, which I developed, is described in Listings 1 (header file) and 2 (source file). Some member functions of this class provide simplified tree navigation of the DOM document, extracting data from the document in four typical situations:

Some of these functions use a recursive node-searching member function, DOM_Node getNodeRecursive(DOM_Node parent, string subNodeName), which I will describe later.

WU_XMLFactory’s constructor takes a string as the first parameter, containing either the XML input or the name and path of the XML document. Which of these two alternatives is determined by the value of the third parameter, the XMLSource enumeration, which can be either XML_MEMORY or XML_FILE for the two cases, respectively. The second parameter of the constructor is a pointer to a persistent container, used by the build member function to store the components and sub-components created by the factory (described in the next section).

Within the constructor scope, an object of class DOMParser is created and used to parse the input, either from file (dp.parse(fs)) or memory (dp.parse(ms)). Finally, if no parsing exceptions are raised, the parser object is used to initialize the class member m_doc. This is an object of class DOM_Document, which is the root node of the memory representation of the DOM tree.

The first of the four navigation functions described above, getNodeValue, uses the DOM API function DOM_Document::getElementByTagName to search for the proper node in the DOM object after converting the user input from std::string to the DOMString object required by Xerces. This API function produces a list of DOM_Node objects, with names equivalent to the one desired. If the list is not empty, the first hit is the one of interest. Now the code extracts the value contained underneath that node (the first child element). As mentioned above, this function should be used only when there is no ambiguity in the tag name, and a single value is expected within the tag (e.g., <tag>value string</tag>). The value string is finally converted into an std::string and returned.

If a tag name is used more than once in the XML document, but in a different context (i.e., with a different parent node in the XML tree), then the navigation function getSubNodeValue resolves this ambiguity. One example of an XML document describing a financial contract is the following:

...
<equity>
     <volatility>0.3</volatility>
    ...
</equity>
<short_rate>
    <volatility>0.05</volatility>
               ....
</short_rate>
...

Here the <volatility> tag is used twice, but in different contexts: once it describes the volatility of an equity asset; the second time the volatility of the interest rate. getSubNodeValue takes two parameters. The first is for the name of a parent node, or also an ancestor node, if enough to resolve the sub-node. In this case, it would be either equity or short_rate. The second parameter is for the desired sub-node, in this case the ambiguous volatility. Note that the first search for the parent node is performed in the same way as for the simpler function getNodeValue. Once the parent node is found, the search for the sub-node starts from there. This time however the search function getElementByTagName cannot be used because it is a member function of the DOM_Document class, which is a derived class from the more generic DOM_Node.

The DOM_Node is the base class for all possible nodes in a DOM document, such as DOM_Attr, DOM_Element, DOM_Document, etc. A DOM_Document node is a particular kind of node: it is the root node of the DOM tree. So I have added a new search function to the WU_XMLFactory class, to allow for searches starting from a generic node of the tree downwards.

The member function getNodeRecursive requires two parameters. The first one is an object of class DOM_Node, to indicate the node of the tree from which the search should start. The second parameter is a std::string, to indicate the name of the node to look for. If successful, this function returns the DOM_Node required. Otherwise it returns an empty node. The search is performed recursively, starting from the parent node. For every child of the parent node, a search path is started. If a node with the desired name is found, the search finishes successfully, and a DOM_Node object is returned. Otherwise a recursive search from one level down is started. When a search path hits an end leaf (a node with no children), the search continues from the closest sibling of the parent node for this search path. If all searches starting from the higher-level siblings are unsuccessful, an exception is thrown indicating that the desired node was not found.

The last two navigation functions, getNodeArray and getNodeSubArray, are the equivalent to getNodeValue and getNodeSubValue respectively, but allow extraction of an entire array of values at a time, in situations like:

....
<short_rate>
    <dates>
        <item>20010701</item>
        <item>20010901</item>
        <item>20011201</item>
</dates>
    <values>
        <item>0.05</item>
        <item>0.06</item>
        <item>0.07</item>
</values>
<numDates>3</numDates>
</short_rate>
...

This example represents a case where a simple term structure of interest rates is given: the short rate today (01/07/2001) is five percent, the forward rate on the first of September is six percent, and the forward rate on the first of December is seven percent. By a single call to getNodeArray, or to getNodeSubArray in case of ambiguity, all dates or values can be extracted at the same time, filling an array of strings.

Finally, the class contains the pure virtual function WU_XMLFactory::build. This function should be implemented by the specialized factories, as shown in the last section. This is the interface for the mechanism with which a specialized factory extracts the information from an XML document, producing an object of a given class.

As mentioned in the first section, the factory is used to produce objects that outlive the factory itself. In other words, objects that are persistent in memory. At the same time, a single factory object can produce objects not necessarily of the same class, because it may need to create sub-components first and then assemble them into the final component. All these objects should then share a common property: persistency. It is a natural choice therefore to devise an interface (in Java jargon), which should act as the abstract base class for all the objects that need persistency. This is what the class WU_Persistent is for, as shown in Listing 3 (header and source code). This class acts fundamentally in three roles:

  1. It acts as the base class for different objects, allowing them to be stored in a simple container.
  2. It associates persistent objects with a unique ID: m_ID.
  3. It contains a pure virtual function, clone, forcing all persistent objects to implement their version of this function.

The clone member function is required because a would-be persistent object, created within the scope of the build function of the factory, is destroyed when going out of scope. For this reason, a clone of the object is created with clone, using the copy constructor of the object, and is placed on the heap memory by the new operator. In this way, clone returns a pointer to a WU_Persistent object, which is the address of the cloned object.

In order to prevent memory leaks when the main application (e.g., the server application) shuts down, as soon as the objects are cloned they are stored in a container, which has a destructor function that takes responsibility for deleting them. This is illustrated in Listing 4, with the header and source code for the class WU_Container.

This class basically wraps a std::map associative container, where each persistent object (a pointer to a WU_Persistent object) is associated to its unique ID (a string).

In this basic implementation of a container, two member functions are used to store and retrieve objects from the container: registerComponent and retrieveComponent, respectively. The first function takes the pointer to a WU_Persistent object to be stored and its unique ID. These objects are paired (make_pair(ID,obj)) and inserted into the map. If the insertion fails because the identifier already exists in the map, an exception is thrown. The second function retrieves a pointer to a persistent object from the map, using its ID for the search. If the ID is not found, a null pointer is returned instead.

An Example

In this section, I give a short example on how the factory is used to extract data from an XML file and to populate the input for a financial contract describing an equity option. Listing 5 is divided into three sections. Section 1 shows a part of the XML input needed to describe the option. Section 2 describes part of the application code where a factory is created and an equity option is priced. Finally, Section 3 describes part of the specialized WU_FactoryEquityOption1F::build code.

The class WU_FactoryEquityOption1F is derived from WU_XMLFactory and specializes the build function to the case of an equity option. In Section 2, this factory is created with a string buff containing the XML input as the first parameter, the address of a persistent container as the second parameter, and the enumeration XML_MEMORY as the third, indicating that the input string buff contains the XML document directly. Then, factory calls its build function (factory.build), to populate its member data m_object with a pointer to a WI_EquityOption1F object. This pointer is returned and stored in option, which is used to call the price member function and to obtain the option price. All is wrapped in a try-catch statement, for possible exceptions thrown from the library modules.

Section 3 shows the factory in action. For example, the option valuation date is retrieved from the XML input by the function WU_XMLFactory::getNodeValue described above. After being transformed into a long, this value is stored in the data structure WD_InstrumentData idata. During the build process, many sub-components are created and stored in the container. One is WM_AssetTree1D* pat, whose constructor requires previously created sub-components, such as pwt1d and pdd, which should be cast from WU_Persistent*. The component addressed by pat is cloned, and its address is stored in the container.

Finally the equity option object is created and its address is assigned to the m_object member datum of the factory. For brevity in the code shown here, the result of dynamic casting is not checked for casting errors.

References

[1] FpML is the Financial product Markup Language (<www.fpml.org>) from the FpML Products Working Groups.

[2] FIXML is the Financial Information eXchange (FIX) protocol (<www.fixprotocol.org>), owned and maintained by the Fix Protocol, Ltd.

[3] <http://xml.apache.org>

[4] Eric Armstrong. <http://java.sun.com/xml/docs/tutorial>.

[5] The World Wide Consortium at <www.w3.org>.

About the Author

After working for several years in experimental physics in Italy and Sweden, Gualtiero Chiaia moved to London where he now works as a financial engineer at Quantin’ Leap Ltd., a company that produces software for the financial industry. He can be reached at gchiaia@quantinleap.com.