String Concatenation & Expression Templates

C/C++ Users Journal June, 2005

A powerful technique for delayed evaluation of expressions

By Craig Henderson

Craig Henderson has been a professional software engineer for more than 11 years, and is currently studying for a Ph.D. in medical imaging. Craig can be contacted at craig2 .henderson@uwe.ac.uk.

Concatenating text strings is a common task, but generally an expensive one because it involves dynamic (heap) memory allocation, copying, and deallocation. For instance, consider an example where a series of STL string objects hold fragments of text that need to be joined together and returned from a function.

std::string get_message(void)
{
  std::string hello("Hello");
  std::string space(" ");
  std::string world("World");
  std::string exclamation("!");
  return (hello + space + world + ' ' 
   + exclamation + space + exclamation);
}

This perfectly legal implementation generates the result that is required, but what of its efficiency? To understand the performance of this, look at how the return expression is evaluated.

The compiler evaluates the expression as a series of subexpressions from left-to-right, examining each addition in turn and matching an appropriate operator, so the expression is equivalent to:

return ((((((hello + space) + 
    world) + ' ') + exclamation) + 
            space) + exclamation);

The STL basic_string class defines an operator+() free function that is invoked by each of these additions.

21.3.7.1 operator+
template
  <class charT, class traits, class Allocator>
     basic_string<charT,traits,Allocator>
      operator+(const basic_string
         <charT,traits,Allocator>& lhs,
           const basic_string
             <charT,traits,Allocator>& rhs);
Returns: basic_string<charT,traits,Allocator>
        ( lhs).append( rhs)

While the implementation of the function is dependent on the STL implementation you're using, the library I use (the Dinkumware STL that ships with Microsoft Visual C++ 7.0) implements the function exactly as shown, for instance:

template
  <class charT, class traits, class Allocator>
inline basic_string
     <charT, traits, Allocator>
operator+(const basic_string
     <charT, traits, Allocator> &lhs,
   const basic_string
     <charT, traits, Allocator> &rhs)
{ return (basic_string
   <charT, traits, Allocator>(lhs) += rhs); }

While functional, this isn't very efficient. First, the lhs parameter is copied to another basic_string, which involves memory allocation and a memory copy. Then operator+=() is invoked on the new object, which ultimately calculates the length of the resulting string, allocates a buffer for it, copies the string into the new buffer, and then copies the parameter (the second string) onto the end of the first. When operator+=() returns, the temporary object that now holds the result is returned to the caller by value, invoking another object copy. That's a lot of copying, and with a nonoptimized build of the aforementioned test, no fewer than 14 memory allocations are performed.

It's not necessarily as bad as that in the real world though, as some optimizations can be—and are—made in different implementations. Over-allocating the size of the array to a block size beyond what is actually required in anticipation of another concatenation is a common technique used by library writers. Other authors use the copy-on-write (COW) technique to reference count the actual string and only perform a memory allocation and copy if the buffer is going to be written to [1]. Regardless of what optimizations can be made, the fundamental problem still exists, and the optimizations just reduce the impact to some degree. The real problem is the creation of lots of temporary objects to hold the result of each operator+() and to be fed into the next operator+(). Look again at the original expression in a construction of a new string.

std::string result (hello + space + world + 
  ' ' + exclamation + space + exclamation);

This produces five temporary objects that cannot be omitted through optimizations.

hello + space  temporary basic_string1
 temporary basic_string1 + world  
               temporary basic_string2
  temporary basic_string2 + ' '  
               temporary basic_string3
   temporary basic_string3 + exclamation 
              temporary basic_string4
    temporary basic_string4 + space  
               temporary basic_string5
     temporary basic_string5 + exclamation 
              std::basic_string
        result

Whenever possible, optimization techniques should be transparent to users. Techniques such as over-allocation and COW can be encapsulated within the class implementation and therefore have no usability issues for the programmers using the library. There are times, though, when this goal is not achievable within the language constraints, but a small requirement on programmers can yield great performance results.

Expression Templates

I was introduced to the intricacies of expression templates by David Vandevoorde and Nicolai Josuttis in their book C++ Templates [2]. They used the technique to implement delayed evaluation of matrix arithmetic. The basic idea is that, instead of operator+() performing the concatenation and returning the result, it "records" the operation and its parameters and returns an abstraction of the operation. The entire expression is represented by a series of abstractions and it is not until the assignment to the result type is evaluated that the entire expression extraction is evaluated in full. At this time, the abstraction can provide more information about what the result of the entire expression is going to be, specifically the length of the resulting string, and one memory allocation can be made that will host the result.

The requirement for programmers is a small change to the expression. The expression "hello + world," where the two parameters are std::string types, invokes the STL operator+() from 21.3.7.1. I need to force my own operator+() to be called so that I can create the abstraction and delay the evaluation. To do that, I introduce a simple wrapper class, concat_string, and rewrite the expression:

std::string str(hello + world);

to read:

std::string str(concat_string
   <std::string>(hello) + world);

The compiler creates a call to:

operator+(concat_string<std::string> 
        const &, std::string const &)

which I define later.

First, I need an abstraction class for the addition of two character-based data types so that I can add string classes, character pointers, and single characters in any combination that language rules allow. I call it addition; see Listing 1. No surprises there, just a simple constructor to store the two values and a couple of access functions. I use a traits class to determine how to store the values. Literal constants and pointers to literal constants need to be stored by value as the parameters' lifetimes may be shorter than the addition object. All other data are stored as reference-to-const to avoid copying objects—after all that's the whole point of the exercise. So the traits class is straightforward, too; see Listing 2.

There are seven operators to handle combinations of char, char *, std::string, and other concat_string<> parameters. Each operator constructs a concat_string object to store an addition object that in turn references or stores the operator parameter; see Listing 3. Each of these operators returns a concat_string object, so each subsequent subexpression also uses these operators and not the STL string operators.

concat_string(hello) + space  concat_string< addition<> >1
 concat_string< addition<> >1 + world  concat_string< 
      addition<> >2
  concat_string< addition<> >2 + ' '  concat_string< 
      addition<> >3
   concat_string< addition<> >3 + exclamation  concat_string< 
      addition<> >4
    concat_string< addition<> >4 + space  concat_string< 
      addition<> >5
     concat_string< addition<> >5 + exclamation  concat_string< 
      addition<> >6
      concat_string< addition<> >6  std::basic_string 
          [invoking evaluation]

The concat_string class needs to store a copy of the abstraction object because the lifetime of the concat_string object extends beyond the lifetime of the temporary abstraction object returned by the operators. Aside from the constructor and an accessor function to return a reference-to-const of the abstraction object stored within, there is only one other member function in the class, and this is where the full evaluation of the expression is performed. The member function is a template conversion operator to a basic_string type [3], and this is called when a concat_string object is assigned to a basic_string type (Listing 4). In the conversion operator, a basic_string object is created locally and reserve() is called to preallocate enough memory to hold the entire result string (see later for calculating the length of the new string). The text from the abstraction objects is then appended into this string using a helper class, appender, which can handle copying single characters, string types, and abstraction objects into a basic_string class. The local string object is then returned from the conversion operator, by value. If the compiler supports named return value optimizations (NRVO), then no more temporaries will be created here.

Bits and Pieces

The appender class provides a generic means of copying data into a basic_string object. This is needed because the STL basic_string class knows nothing of the addition class. The appender class provides three function call operators—one for a single character, one for a basic_string object, and the third for an addition object (Listing 5).

The final piece of scaffolding is the mechanism for calculating the length of the result string. This is done by recursively totaling the sum of the lengths of the component strings. This is done through a series of free functions that know about the length of a specific type—one for a single character, a point to characters, a basic_string object, and an addition object (Listing 6).

Results

The success of this technique is difficult to measure accurately as a runtime performance saving, so instead I measured the number of requests to the heap manager for memory allocation. Using Listing 7, I ran the tests with T1 and T2 both as std::string types, and again with T1 as concat_string<std::string> and T2 as std::string.

Using Microsoft Visual C++ 7.0, the first test compiled without optimizations with native std::string types resulted in 14 memory allocation requests, and the second test using the new technique results in three allocation requests.

Conclusion

Expression templates are a powerful technique for delayed evaluation of expression, and we can take advantage of the delay and allocate just one buffer that is large enough to contain the string that results from the entire expression. The expression is evaluated only when it is assigned to a basic_string object, so if the expression is never assigned, then the expression is never evaluated. This is contrary to the normal behavior that would create a host of temporary strings and then just throw them away. The support scaffolding is fairly extensive, but the end result is worth it. I have shown a saving of 11 memory allocations in the running example, and this is just the start of the savings. By eliminating the temporary objects that are created during normal evaluation of a string concatenation expression, I have prevented not only the heap allocations that I've already demonstrated, but the associated memory copying and deallocation, too. Perhaps more importantly, the new allocation measure is a constant value—regardless of the number of additions in the expression, the same number of memory allocations will be performed.

References

  1. [1] The STL shipped with this compiler employs an optimization for small strings that use an internal buffer until the string exceeds the buffer length and then moves to heap allocation. For the testing, I added a length of text to the "Hello" string to exceed the internal buffer limit immediately.
  2. [2] Vandevoorde, David and Nicolai Josuttis. C++ Templates, Addison-Wesley, 2003, ISBN 0201734842.
  3. [3] Microsoft Visual C++ 7.0 fails to compile with this template conversion operator, so an std::string operator is provided for this compiler.