July 2002/XParam — A General Purpose Serialization Framework for C++

Software Tools

XParam — A General Purpose Serialization Framework for C++

Michael Brand, Ronnie Maor, and Sasha Gontmakher

From command-line processing to object serialization — this class does it all.

Introduction

One of the first things a program should do is parse input parameters. There are several tools for accomplishing this, such as the GNU getopt [1]. However, most of these tools are C-based.

We have developed a C++-based tool for parsing parameters. It improves the process in a way similar to how iostream is an improvement over printf/scanf: it is type-safe and extensible. We call the tool XParam, which stands for Transfer of Parameters.

Traditionally, program parameters are expected to be of basic types. getopt allows you to pass integers, floats, and string values to programs. While this is sufficient in C, in C++ you want to be able to use any object as a parameter. In XParam, parameters can be of simple types, such as int, double, or string. In addition, they can be of any C++ type, including STL classes and user-defined types.

In order to initialize general C++ types from input text in a type-safe manner, XParam includes a complex value manipulation engine. This engine must know parts of the types’ interfaces. Ideally, it should receive the necessary information directly from C++. However, C++ has no reflection capabilities. Therefore, these types need to be described to XParam explicitly. This is accomplished by registration, which we will describe later in this article.

Using object-oriented methodology in parameter handling naturally suggests extensions that were previously impossible. We added a crucial feature that allows polymorphism across the input barrier. This addition almost immediately found use in several software projects that employed XParam as their plug-in management framework, easily achieving what previously necessitated a very costly implementation.

Adding an output capability to XParam was also immediately put to use for cross-program communication: the output of one program can readily be processed as the input of another and can be passed across communication lines and between platforms.

In this article, we present XParam and describe some interesting C++ techniques that were used in writing it. We will base our examples on a hypothetical object-drawing program, which uses classes such as Point, Line, and Circle.

XParam is free and available at <http://xparam.sourceforge.net>.

Usage Examples

Suppose you want to write a program for drawing shapes on the graphic screen. With some work, you can program classes Line, Point, Circle, etc., which can display themselves. There is only one thing left to do at this point: you need to write the main function. But how do you get the user to tell you what she wants to draw?

Without XParam, this would be a grueling task: every time you want to finish a project of this sort, you need to write command-line parsing code specific for it and its objects. With XParam, however, command-line parsing is generic and easy to use, so you are only a few lines of code away from a working program.

Concrete Classes

To keep the explanation simple, we will begin by letting the user enter two points. The program will draw a line connecting them.

A sample call to your program might look like this:
draw "p1=Point(7,8) p2=Point(10,3)"
The program that reads this input and draws the line looks like this:
#include <xparam.h>
#include "drawing_objects.h"
using namespace xParam;

int main(int argc, char* argv[]) {
  Point pt1,pt2;
  ParamSet ps;
  ps << iParamVar(pt1,"p1")
     << iParamVar(pt2,"p2");

  ps.input(argc,argv);
  Line l(pt1,pt2);
  l.draw();
  return 0;
}
The first thing this program does is to define a ParamSet object. ParamSets are objects that contain the list of parameters that you wish to read. Parameters are inserted into the ParamSet using operator<<. For each parameter, you need to specify the name the users will assign to on the command line (e.g., "p1"), and the variable into which XParam will assign the value read (e.g., pt1). Typically, you would use the same name for both the C++ variable and the parameter name. The only reason for not doing so here is to clarify that these two names are conceptually independent.

The call to ParamSet::input reads and parses the command line. Once this line executes, pt1 and pt2 are initialized with values and ready to use, so we can simply construct Line l from them and draw it.

When XParam parses the input Point(7,8) it understands that it should build a Point object from two integers. It does this by calling Point’s suitable constructor. For this to work, all you need to do is to register class Point and its constructors. Registration is a simple and non-intrusive process. Listing 1 shows the registration code for this example. Link it into the project and you’re done. The registration code will be explained later on in this article.

As with iostream, the input process is type safe. The variables pt1 and pt2 have a C++ type of Point. Trying to initialize them with anything that isn’t compatible with a Point will trigger error handling. On the other hand, if Point’s constructor requires doubles instead of integers, the input Point(7,8) will still work — XParam will automatically convert the integers to doubles.

Handling Complex Classes

You are now ready for Draw 2.0, a follow-up to your successful drawing program. Draw 2.0 will still draw a line, but instead of reading two Point objects and combining them, it will read a Line object directly.
int main(int argc, char* argv[]) {
  Line l;
  ParamSet ps;
  ps << iParamVar(l,"l");
  ps.input(argc,argv);
  l.draw();
  return 0;
}
You can now invoke the program like this:
dr2 l="Line(Point(2,3),Point(4,5))"
It works! To do this, XParam creates two Point objects, as in the previous example, and then calls Line’s constructor, passing the two Points as arguments.

Since XParam uses the class’s own constructors, the user can build objects on the command line using the same syntax she would use to build them inside a C++ program.

Overloading is also supported. If a class has several constructors, the user is free to use any one of them, and XParam will find and call the appropriate one. Assuming the relevant C++ constructor is defined and registered, you can also invoke Draw 2.0 like this:
dr2 l="Line(Point(2,3), \
            Offset(2,2))"
It will produce the same Line. The choice is now up to the user.

The algorithm that XParam uses to match a suitable constructor is modeled after C++’s overloading resolution rules. If no suitable constructor is found, or an ambiguity is discovered, XParam will trigger an error.

Polymorphism and Dynamic Loading

After the success of your first two drawing programs, you are now ready to flood the market with Draw 3.0, the polymorphic drawing utility. This program has an abstract Shape class that all your concrete shape classes, such as Point, Line, and Circle, inherit from.

To let your users draw an arbitrary shape, the main function is changed to this:
int main(int argc, char* argv[]) {
  Shape* s;
  ParamSet ps;
  ps << iParamPtrVar(s,"s");
  ps.input(argc,argv);
  s->draw();
  delete s;
  return 0;
}
The program can now recognize both s=Line(Point(7,8),Point(9,10)) and s=Circle(Point(1,2),7). XParam will call the correct constructor, making s point to a Line or a Circle, respectively.

As you can see, XParam simplifies your initialization command when it comes to polymorphism and pointer syntax: even though s is a Shape pointer, you still write Line, not Line* or new Line, on the command line.

By allowing polymorphism, XParam allows you to support plug-ins effortlessly. Users can write Shape classes of their own, such as Rectangle or Triangle, and XParam will automatically find and use them. Here’s how it works: when XParam encounters a class name that it does not recognize, it tries to dynamically load the shared library that contains the class and its registration. If such a library is found, XParam uses the new class, just as if it had been registered all along.

Vector Support and Input Redirection

Users of Draw 3.0 may want to draw more than one shape. To support this, you would write Draw2000 like this:
std::vector<Shape*> s;
. . .
ps << iParamVar(s,"s");
. . .
for_each(s.begin(),s.end(),
            mem_fun(&Shape::draw));
. . .
A user might call the program to draw a shape composed of a line and a circle with this execution command:
draw2000 s="[Line(Point(10,10), \
                  Point(20,0)), \
          Circle(Point(10,10),10)]"
The [x,y,...] syntax allows XParam users to pass entire vectors in a single line. Creating vectors this way is actually simpler than doing the same in C++.

Though this is great for drawing two or three shapes, for more interesting drawings writing everything on the command line becomes rather cumbersome. To help with this, XParam provides a feature for reading the input, or parts of it, from a file.

Suppose, for example, that the user has prepared the file in Listing 2. The following call will then draw the stick figure shown in Figure 1:
draw2000 s=@shapes.txt
The @ is a redirection operator. It tells XParam to read the value for s from the file shapes.txt before reading the rest of the command line. The @ operator can appear anywhere a value is required, or where an entire list of key=value commands is expected.

To make XParam read from the standard input, you can use the special redirection source @stdin. This is particularly useful for piping the output of one program into another.

In the future, we plan to extend the redirection mechanism by using the URL concept. This will enable XParam to handle TCP sockets, database connections, HTTP, etc.

Output and Inter-Program Communication

So far we have talked about the Draw2000 program, which only displays its input. Now, we would like to provide a graphic editor that can save drawings to a file in a Draw2000-compatible format. For such purposes, XParam lets you output parameters. Here are the relevant portions of the graphic editing program:
. . .
std::vector<Shape*> s;
. . .
ParamSet ps;
ps << oParamVar(s, "s");
ofstream os("shapes.txt");
ps.output(os);
. . .
After running this program, you can display the shapes with:
Draw2000 @shapes.txt
XParam’s I/O format is human readable and editable. This makes XParam ideally suited for managing configuration files.

Single Object Serialization

Often, all you need is to save an object to a stream in order to read it back later, for example to achieve persistence. Here’s how it is done with XParam. To write an object to a file, use XParam’s Saver class:
. . .
MyObj obj;
. . .
ofstream os("obj.txt");
Saver(os) << Val(obj);
. . .
To read the object, use the Loader class:
ifstream is("obj.txt");
Loader(is) >> Var(obj);
The ParamSet class, described above, actually uses Saver and Loader, so the format of serialized objects is the same in both cases.

Registration

We have briefly shown a registration example for the class Point. In this section, we will explain the registration process in more detail.

All classes that you want XParam to recognize should be registered. For your convenience, XParam pre-registers the built-in types and std::string. It also pre-registers the instantiations of std::vector for all these types.

The process of registration is very straightforward (see Listing 1 for an example). The registration code is regular C++ code.

Typically, you want the registration to be performed automatically, so you won’t have to worry about the exact time it takes place. You can do this by enclosing your registration commands between PARAM_BEGIN_REG and PARAM_END_REG macros. These macros do nothing more than execute the C++ code between them before main is entered (see the sidebar for details).

Registering a Class

The following command registers the C++ class Point and tells XParam to recognize it in the input under the name "Point":
param_class<Point>("Point");
Most often, the name given to XParam is identical to the C++ name of the class. When this is the case, you can use the macro PARAM_CLASS. The following registration command has the same effect as the previous one:
PARAM_CLASS(Point);
Registering Constructors

Class Point has three constructors:
Point::Point();
Point::Point(int x, int y);
Point::Point(const Point& old);
XParam requires that all concrete classes be copy-constructible and assignable. The copy constructor is registered automatically. The following lines from Listing 1 register the two other constructors:
param_ctor<Point>();
param_ctor<Point>(ByVal<int>("x"),
                  ByVal<int>("y"));
For each parameter in a constructor, you tell XParam its type, its formal name, and how to pass it. In the above example, ByVal<int>("x") tells XParam that the argument’s name is "x", that it’s an int, and that it is passed by value. The formal parameter names are necessary for providing detailed error reporting and run-time help.

Passing by value is not the only way to pass parameters. There are several other modes. For example, arguments passed by const reference use ConstRef<T>, so the constructor:
Line::Line(const Point& a,
           const Point& b);
is registered by:
param_ctor<Line>(
             ConstRef<Point>("a"),
             ConstRef<Point>("b"));
For arguments passed as pointers, you should specify whether the pointer should be deleted by the caller (i.e., XParam), or whether ownership of the pointer is passed to the called class. A pointer parameter can also be either T* or const T*. Thus, these four passing modes are available: CallerPtr<T>, ClassPtr<T>, CallerConstPtr<T>, and ClassConstPtr<T>.

Registering Output

To output your classes, you need to tell XParam how to serialize them. This registration process is as easy as registering constructors. When constructing, you show how to build your class from simpler ones. Equivalently, output registration need only tell XParam how to break down an object into simpler sub-objects. When serializing an object, XParam recursively decomposes it into its sub-objects, until reaching primitive types, which are output as literals.

Listing 3 shows the code required to register an output capacity for class Point. The code registers Point_output, the helper class that will perform the output. Point_output tells XParam how to decompose the Point object p into two integers. Here’s an example of output produced by this code:
Point(2,3)
Such output can be used by XParam to construct a Point using the second constructor registered above.

Outputting complex objects is still simple. Listing 4 demonstrates that registering class Line is as easy as registering class Point. Output produced by serializing a Line will look like this:
Line(Point(2,3),Point(4,5))
This can be used to rebuild the entire Line object, using the constructors for Point and Line registered above. Note that your output functor does not need to break the Line object all the way down into basic types. You just specify how to decompose the line into Points, and XParam handles the rest.

This even works if Line’s sub-objects are polymorphic. Consider a Line that supports different ending styles. Such a Line will have a constructor from two Point* members. Serializing a Line with an arrowhead ending will still produce the correct output:
Line(Point(2,3),ArrowPoint(4,5))
Registering output as described in this section carries important benefits. First, it is less error prone than writing the output code manually. Second, if XParam is extended to support other formats, such as XML, your registration code need not change. Finally, you are insulated from changes in the sub-objects. For example, if class Point’s output is changed to use polar coordinates instead of Cartesian coordinates, the registration of Line remains unchanged.

Other Registration Commands

XParam has many other registration commands. You can register conversion operators, describe inheritance relationships, register constants and enumeration types, and so on. The XParam documentation explains these commands in detail.

Implementation

In the previous sections, we described XParam’s main features. At first glance, some of the above may have seemed like magic. In this section, we will show how it can all be translated to C++ code. Several of the techniques we present are widely applicable, regardless of XParam.

Overview

Here is what XParam does when it is asked to build a Line from the following string:
Line(Point(2,3),Point(4,5))
First, the line is parsed to reveal the construction tree shown in Figure 2. At this point, XParam recursively builds the Line object by applying constructors. This is the C++ code that we want to be executed behind the scenes:
Point tmp1 = Point(2,3);
Point tmp2 = Point(4,5);
Line  tmp3 = Line(tmp1,tmp2);
return tmp3;
The problem is that all the information required is only available at run time, as strings. The key to XParam is the ability to create live objects from strings in a type-safe manner.

Holding Values

To deal with objects of different types, XParam needs to be able to access them through a common interface. However, requiring all the types to inherit from a common base is impractical: this is too intrusive and does not solve the problem of basic types. Instead, XParam employs external polymorphism [2]. Using it, we have implemented a class called Value, similar in design to the any class described by Henney [3].

Constructing Values

You need to do more than simply hold the data. You need the ability to construct a Value from other Value objects, by locating and applying the most suitable constructor.

The built-in C literals are recognized by the parser, and a suitable value object is built for them. In the above example, these are the integers 2, 3, 4, and 5.

Now, XParam has to construct the Value object for Point(2,3) from two Value objects containing the integers 2 and 3. How does XParam find and apply the correct constructor for Point?

As you recall, a class must be registered to work with XParam. Registering a type creates an object of class Type, which stores all the information about that type. This object is stored in the Registry Singleton, which contains a map from strings to Type objects.

Now, to construct the Value for Point(2,3), XParam uses the class name, "Point", to retrieve the corresponding Type object from the registry. It can also retrieve the type information for the arguments 2 and 3 from their Value objects. XParam now needs to find the most suitable constructor for Point and apply it.

Handling constructors polymorphically is very similar to handling Values polymorphically: the constructor information is contained in a class, which has a polymorphic common ancestor, and a template implementation. The polymorhic interface contains, among other methods, the method to apply the wrapped constructor:
Value* Ctor::construct(
       const vector<Value*>& args);
A simplified version of the template implementation of this method for a two-argument constructor is shown in Listing 5. Similar template implementations are available for constructors receiving a different number of arguments.

Handling Implicit Conversions

Using the process shown above, XParam begins by constructing literal constants and proceeds by building increasingly complex classes until reaching the required class. However, often none of the registered constructors match the required types exactly. When this is the case, XParam will try to find the best matching constructor and will apply an implicit conversion sequence to each of its arguments.

For each argument, the exact conversion is the simplest conversion sequence from the given type to the type required in the constructor. The simplicity of a conversion sequence is measured by the kinds of atomic conversions composing it (built-in, user defined, etc.) By constructing a graph of all the possible conversions, you can employ a shortest path algorithm, such as Dijkstra, to find the best conversion sequence.

Now that you know the best conversion for every argument in each of the candidate constructors, you need to choose the most suitable constructor. This is done in exactly the same way as done by the C++ compiler.

The process described above is a simplified version of the actual matching algorithm. In reality, the algorithm must handle more complicated cases. For example, when constructing a vector from input in the form [a, b, c], the vector type is not always explicitly specified. Since a, b, and c may be of different types, XParam must deduce the best common type T to build the object vector<T>.

Summary

In this article, we have briefly introduced the XParam template library. The full source code and documentation for the latest version at the time of writing (version 1.11) can be downloaded at <www.cuj.com/code>. As newer versions are released, they will be available from <http://xparam.sourceforge.net>.

XParam’s strengths of object serialization and deserialization, command-line input and output, cross-program communication, and plug-in management (through dynamic loading) have already been put to use by many programmers, ourselves included. We hope that you will find the library as useful as we do and encourage you to contact us for comments, suggestions, and improvements. This can be done through the XParam homepage.

We would like to acknowledge the help of the SourceForge organization in hosting the XParam project, and of jGuru in providing the ANTLR parser generator used by XParam. We would also like to thank Richard Stallman for his assistance in turning the XParam library into free software.

References

[1] The GNU C++ Library, The Single Unix Specification, Version 2 (The Open Group, 1997).

[2] Chris Cleeland et. al. “External Polymorphism,” Pattern Languages of Program Design 3, edited by Robert Martin et. al. (Addison-Wesley, 1998).

[3] Kevlin Henney. “Valued Conversions,” The C++ Report, July/August 2000.

Michael Brand has worked in the computer industry for many years, being employed as a program architecture designer, an algorithm developer, a programmer, and a teacher of programming. His specialties include object-oriented design and C++. He holds a B.Sc. from Tel-Aviv University.

Ronnie Maor has an M.Sc. in computer science from Tel-Aviv University. He has spent the last few years designing and programming large C++ projects and is currently working for Kashya inc., a start-up company developing new solutions in the field of storage.

Sasha Gontmakher has an M.A. from the Technion, Israel Institute of Technology, and is currently studying towards his Ph.D. degree. He specializes in object-oriented programming and design and parallel programming.