Features


Adaptable-Size Classes

Julian Bushell

Save space by letting the C++ template facility generate classes with only the data you need.


This article describes a technique I developed called adaptable-size classes. This technique helped me build a software system that minimized disk I/O and maximized available memory at high levels of data throughput. The resulting C++ code was not only easy on the eye, but it also minimized code bloat and promoted the powerful constructs of abstraction and reusability for which C++ is famous.

Introduction

To put this article into context, I will first briefly describe the system that I developed using this technique. Next, I will show the boilerplate code that enables C++ classes to be adaptable in size. Then, I will delve deeper into business data objects and explain some practical technologies that anyone can use to implement their own adaptable-size classes.

To keep the code minimal for this article, I’ve omitted the use of private attributes and accessors. In my explanations, I’ve used the term class to mean both a C++ class and a C++ struct.

History

Not long ago, I had the opportunity to develop a financial reporting system in C++ that used as its input future-projected, simulated balance-sheet data items from 5,000 possible future scenarios for 20 future years from over 100 companies — in other words, gigabytes of data, more than could fit into available memory. Any calculation involved the relatively slow access of retrieval from a database.

Being a lazy programmer, I wanted to write as little code as possible, make it look beautiful and maintainable, yet cope with the massive amount of data to be processed by being both efficient in its memory use while minimizing disk I/O (the bottleneck in system performance). As a result, I decided to develop the adaptable-size class technique.

Adaptable-Size Classes

Imagine a C++ class defined with three attributes:

struct CBA {
    double a;
    double b;
    double c;
};

My definition of an adaptable-size class is a family of classes with the same or a subset of its attributes in any order. Each variation on the class I term a class variation.

Hence to make struct CBA into an adaptable-size class, just define other class variations. For example:

struct CB {
    double b;
    double c;
};

struct CA {
    double a;
    double c;
};

struct AC {
    double c;
    double a;
};

struct B {
    double b;
};

etc.

Defining Adaptable-Size Classes

Though every class variation above can be hard coded as shown in the previous section, this will rapidly lead to code bloat and decrease the maintainability of your code, which is definitely not in keeping with my lazy programmer attitude. Also the variants will not possess any of the magical properties highlighted in the next section. But with my technique of adaptable-size classes, only one statement is required to define each class variation. For example, for the classes in the previous section, the class variations are defined:

typedef C_<B_<A_<Base_> > > CBA;
typedef C_<B_<Base_> > CB;
typedef C_<A_<Base_> > CA;
typedef A_<C_<Base_> > AC;
typedef B_<Base_> B;

Taking an arbitrary class variation:

typedef C_<A_<Base_> > CA;

this defines a class variation CA that uses inheritance to combine template classes C_ and A_ where A_ contains attribute a and class C_ attribute c.

template <class Parent> struct A_: public Parent {
    double a;
};

template <class Parent > struct C_: public Parent {
    double c;
};

Hence class CA will contain the attributes a and c. I call template classes C_ and A_ attribute classes. You can think of an attribute class as a wrapper class that exposes one attribute.

So to define any class variation combination, just list the attribute classes as nested template parameters, immediately after the typedef and before the class (variation) name.

To recap, an adaptable class is made up of class variations, and each class variation is made up of a subset of attribute classes in any order.

Magical Powers of Adaptable-Size Classes

So far, I’ve told you how to define an adaptable size class, but not what they can do. For starters, by adding an assignment operator to each attribute class, when assigning one object class variation to another, it is possible to assign values for all attributes that are common to both classes, using just one line of code.

For example, instantiating these objects:

CBA cba;
CA ca;

and then later assigning cba to ca:

ca = cba

will cause the values for attributes c and a of cba to be assigned to the corresponding attributes of c and a in object ca. Conceptually the compiler expands the line ca=cba to:

ca.a=cba.a
ca.c=cba.c

In fact, adaptable-size classes allow any combination of attributes’ values to be copied/assigned. Here are some further examples with comments on the right:

cba = b;   // cba.b=b.b
cb = cba;  // cb.b=cba.b and cb.c=cba.c
ac = ca;   // ac.c=ca.c and ac.a=ca.a
cba = ac;  // cba.a=ac.a and cba.c=ac.c
cba = cb;  // cba.b=cb.b and cba.c=cb.c

Notice that it doesn’t matter if the left-hand side-object contains a subset of attributes of the right-hand-side or if it is a superset of the right-hand side object.

Of course, this isn’t just limited to assignment operators; it can be extended to any behavioral operation or method you can imagine. For example, addition:

cba = ca + b

Required Boilerplate Code

Listing 1 shows the boilerplate code required for an attribute class A_ with an assignment operator and a copy constructor that will allow class variations to be assigned/copied between each other. This code is the same for each attribute class. The only difference will be the class name and its attribute name. The full source code for Listing 1 is in the Visual C++ project CBA available for download at <www.cuj.com/code>. The source code has also been tested with Code Warrior Pro 6.1.

The Base_ class is used as the root class to end the nesting of attribute classes when defining a class variation.

struct Base_ {
    
    template <class T> Base_(const T &) {}
    Base_() {}
    
    void set(...) const {}
};

The boilerplate code relies heavily on the use of overloading and templates to achieve the overall functionality provided by each class variation.

Figure 1

It is useful to understand that each class variation has its own individual inheritance hierarchy of attribute classes, each starting with Base_ as the root. (See Figure 1 for the class hierarchy of two class variations.) By calling parent class methods and using overloading to traverse up and down these hierarchies, the boilerplate code is able to find the common matching attributes between the two class variations. In Figure 1, the common attributes are those in classes C_ and A_. So executing the line:

cba=ca

causes the ca’s a attribute to be assigned to cba’s a attribute and the c attribute from ca to be assigned to cba’s c attribute.

Okay, let’s delve deeper into this. (If you understand how this works already, please go to the next section). The first method that is executed is cba’s assignment operator (i.e., C_B_A_<Base_> > >::operator=(ca)).

Examine the source code for cba’s assignment operator:

template <class T> C_ & operator=(const T & t_)
{
    Parent::operator=(t_);
    t_.set(this->c, C_<Unique_>());
        
    return *this;
}

The first line Parent::operator=(t_) invokes cba’s assignment operator for its Parent class B_<A_<Base_> >::operator=(ca). Since the boilerplate code for each attribute class is the same, the execution proceeds up the inheritance hierarchy from B_<A_<Base_> until the Base_ class is reached.

Figure 2

Figure 2 illustrates the following flow. As the default assignment operator of Base_ does nothing, the program stack is unwound to A_<Base_> >::operator=(ca), and its second line t_.set(this->a, A_<Unique_>()) is called. This effectively swaps the left-hand- side and right-hand-side class variations, and this time the method Parent::set(t_, t) in the boilerplate code (see Listing 1) leads to a traversal up ca’s class hierarchy. This continues until the ca’s Base_ class has been reached or until an overloaded set method with a second parameter of type A_<Unique_>() has been matched in one of the attribute classes that make up ca’s class hierarchy. In this case, ca’s class attribute A_ does have such a method set(double & a, A_<Unique_>). Its implementation is a = this->a, which assigns the value in ca’s a attribute to cba’s a attribute. Phew, that’s one attribute assigned.

Next, the program stack is unwound again and the method set(this->b, B_<Unique_>()) is looked for in ca’s class hierarchy. Since ca does not have a B_ attribute class, the method is not found and class Base_ is reached.

The program stack is unwound one last time. The class hierarchy for ca does have a set(double & c, C_<Unique_>) method, so assignment of the attribute c in ca is made to cba’s c attribute.

A Real World Example

There are three things that need to happen when I code:

The first point is a subject of readability, understandability, and maintainability of my source code. I like to represent real-world business objects in the code itself. With adaptable-size classes, I can do this throughout the code. I name the attributes and attribute classes in a manner consistent with the naming conventions used by the business domain experts. For example, here are some of the attribute classes that make up the adaptable-size class for a company balance sheet:

RetainedProfit_, ShareCapital_,
CurrentLiabilities_, Machinery_,
Land, Cash_, Debtors_, Stock_

Next, here are some class variations in my source code using these attribute classes:

typedef RetainedProfit_<
  ShareCapital_< 
  CurrentLiabilities_< 
  Machinery_<Land_<Cash_< 
  Debtors_<Stock_<
  Base_> > > > > > > > BalanceSheet;

typedef  Cash_< Debtors_<Stock_<
  Base_> > > CurrentAssets;
typedef  Machinery_< Land_< 
  Base_> > FixedAssets;

typedef  CurrentLiabilities_<
  Base_> CurrentLiabilities;

typedef RetainedProfit_<
  ShareCapital_<Base_> > Equity;

Then in true generic programming fashion, I would place these objects in STL containers when needed and feed them to various algorithms that would access them via iterators. But how do you get them from the database to the STL containers in the first place?

This brings me to my second point: avoiding having to deal with the conceptual shift from the relational database world to the object-oriented/generic paradigms of C++. My answer is to use a database abstraction layer that makes my database look like an STL container. This technology is called the DTL (Database Template Library) and is available as a free download. DTL allows an ODBC database to be viewed as an STL container and is an easy way of storing and retrieving adaptable-size objects to and from a database. Without going into too much detail, DTL works by allowing attributes in classes to bind with fields in database tables.

To get DTL working with adaptable-size classes, all you need to do is implement an extra method in the boilerplate code and in each attribute class. For example, this method in the Debtors attribute class:

void bindColumns(BoundIOs &cols)
{
 cols["debtors"] == debtors;
Parent::bindColumns(cols);
}

binds the attribute debtors to a field “debtors” of a database table and then calls the bindColumns method of the parent class ensuring that all the other attributes that make up the class variation have their attributes bound to the database table as well. For details on how to use DTL, visit their website. For details on how to use DTL with adaptable-size classes, see the source code in the BalanceSheet Visual C++ 6 project available for download at <www.cuj.com/code>.

Instead of setting up SQL statements to select the data that is needed from the database and then extra code to put them into objects’ attributes, you can set up a class variation in one line:

typedef  Cash_< Debtors_< Stock_<
  Base_> > > > CurrentAssets;

Then give this class to DTL and let its abstraction layer deal with loading/saving the data in the object attributes from/to the database table.

This brings me to my last point: minimizing database access. If a user of my software system elects to run a report involving current assets, then I will keep the CurrentAssets objects residing in memory after the report has completed. If the next report requires a calculation that uses, for example, only Debtors since this is one of the attributes of CurrentAssets, some or all of the values needed will already be held in memory from the previous run. For each CurrentAssets object in memory, I will then assign them to the smaller class variation Debtors using the one-line assignment:

Debtors = CurrentAssets

Hence, this eliminates the need to reload any Debtors details that are already being held in memory via any CurrentAssets objects. When memory is at a premium, because Debtors objects are smaller than CurrentAssets objects, I can load a lot more into memory.

Conclusion

Adaptable-size classes represent an easy-to-use syntax for defining a family of classes that share a subset of common attributes. The ease to which class variations of an adaptable-size class can be defined allows for custom classes that hold only just enough data required for a particular algorithm. This allows for efficient use of memory. If objects of these classes are persisted in a database, then disk I/O is also reduced as unnecessary attributes are not read/written from/to the database.

Adaptable-size classes allow for easy assignment/copying of values of common attributes between different classes. This type of cooperating behavior can be extended to other operators and methods.

Acknowledgement

Thanks to Jason Poynting for proofreading and pointing out a code improvement.

Julian Bushell holds a Bsc(Eng) in Computing Science at London Imperial College of Science, Technology, and Medicine. He is a software consultant currently working with Deloitte and Touche and specializes in financial modeling systems. He is also studying an MSc in Financial Markets and Derivatives at London Guildhall University. He can be reached at jabushell@btinternet.com.