C/C++ Users Journal October, 2004

Ranges: Concepts and Implementations

A simple and efficient technique for handling collections

By Matthew Wilson and John Torjo

Matthew Wilson is a software development consultant for Synesis Software and creator of the STLSoft libraries. He is author of the book Imperfect C++ (Addison-Wesley, 2004). Matthew can be contacted via http://imperfectcplusplus.com/. John Torjo is a freelancer and consultant who specializes in C++, generic programming, and streams. He can be reached at john@torjo.com.

As its name implies, a range represents a bounded collection of elements, which may be accessed in an incremental fashion. In this two-part article, we examine how the Range concept has several advantages over the current state of the art with respect to handling collections. First, a range affords several modest, but worthwhile, syntactic advantages. Of more significance, a range expands the common representation and manipulation of collections to virtually all currently known collection models. This is crucially different from the STL, which provides only representation and manipulation of sequences and sequence-like objects, or containers that have been persuaded to act like sequences (std::map, for instance). The final advantage is that a range supports a powerful and succinct mechanism for applying filters to ranges of elements.

The Range concept is not allied with, nor dependent upon, any particular library. In this installment, we demonstrate the concept with code drawn from the STLSoft libraries, Version 1.7.2 onwards [1]. In the next installment, we'll focus on the range implementation based around the Boost libraries. For more on the Range concept, including information, code-links, and expository material, see [2]. In addition, see Chapter 34 of Imperfect C++, by Matthew Wilson (Addison-Wesley, 2004).

What's Wrong With Iterators?

The short answer to the question of, "What's wrong with iterators?" is that there's nothing wrong with iterators (or rather, the STL Iterator concept), but only as far as it goes. Iterators are probably the most important concept in the STL, as they are the medium of communication between containers and algorithms. The abstraction they provide facilitates the separation of container types from the algorithms that may be used with them. This is a profound achievement. Iterators are able to express enumeration concepts as diverse as single-duplex communications lines, linked-lists, indexed enumeration APIs, and pointers.

Still, iterators have some limitations. The first, and most obvious, being that using iterators can be cumbersome and can result in exceedingly messy client code. To be sure, when used with suitable algorithms and functions (or function objects), iterators provide almost unrivalled succinctness and clarity—there are few more pleasing things in C++ than statements such as:

for_each(vars.begin(),vars.end(),
  ostream_iterator<var_t>(cout,"\n"));

The problem is when you need to do something that's not provided for by standard algorithms and function objects. In such cases, you must either write a custom function (object) or, more commonly, unroll the algorithm into a manual loop. In either case, elegance and clarity goes out the window. Consider this code:

void CoalesceOptions(. . .)
{
  . . .
  { OptsUsed_map_t::const_iterator 
     b = usedOptions.begin();
    OptsUsed_map_t::const_iterator 
     e = usedOptions.end();
  for(; b != e; ++b)
  {
    OptsUsed_map_t::value_type const 
     &v = *b;
    if( !v.second &&
        v.first->bUseByDefault)
    {
      arcc_option option;
      option.name  = v.first->fullName;
      . . .
      arguments.push_back(option);
    }
  }}
  . . .

Such things can be especially problematic when there are two or more enumerations to be performed in the same scope. Not only do you have a whole lot of code to look at, but you may also run short of recognizable iterator names. Whether you're a user of begin and end, or b and e, things get abstruse when you've got b_options, b_compilers, b_languages, and so on littering your 80-column (or even 120-column) code view. The new for-scoping rules can ameliorate this somewhat, but relying on them is still a portability headache [3]. The first advantage of the Range concept is that it provides a big help in addressing this situation.

The other, more fundamental, advantage of ranges is that they encompass a broader range of collections than that supported by iterators. Although iterators provide powerful and low-grained manipulation of elements in situation, the cost of this power is that the set of collections for which iterators may be used is restricted, as shown in Figure 1.

Consider how you might represent and manipulate notional collections—those where a conceptual range exists but there are no "real" elements in memory. A simple example of this is where you might wish to enumerate the drive letters for a Windows system, 'A'-'Z', via an STL algorithm, as in:

std::vector<char>   valid_drives;
std::for_each(from, to, 
  push_valid_drive(valid_drives));

The problem here is how to define from and to? In this case, most of us would probably unroll the loop, and use for(char ch = 'A'; ch <= 'Z'; ++ch). An STL approach might be something like this (assuming a character encoding whereby the letters 'A'-'Z' occupy contiguous code points):

std::for_each( char_iterator('A'), char_iterator('Z' + 1),
              push_valid_drive(valid_drives));

The alternative would be to use an array of characters, either generate()-d with a custom written functor—more work—or in the following static form:

char drive_letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::for_each( &drive_letters[0],
              &drive_letters[sizeof(drive_letters)], 
              push_valid_drive(valid_drives));

The first approach requires a custom iterator class. You may have such a class, or perhaps an integral_iterator<> template class, that can be used. But you must still provide the second, "phantom" iterator required to terminate the enumeration. Furthermore, you must ensure that the two represent a valid range.

The second approach, while efficient, is brittle and open to mistakes. How easy it would be to follow instinct and add -1 to the sizeof() expression, or to forget the terminating null character placed there by the compiler and add +1. The Range approach would be:

for(integral_range<char> r('A', 'Z' + 1); r; ++r)
{
  push_valid_drive(valid_drives)(*r);
}

or:

r_foreach(integral_range<char>('A','Z'+1),
              push_valid_drive(valid_drives));

For sure, this also requires the use of a class (in this case, the STLSoft integral_range<> template class; see http://www.cuj.com/code/) so that may not necessarily be a gain over the iterator-based approach. But the advantage is that the beginning and ending conditions are kept together. Moreover, since they are passed as parameters to the same function—the range constructor—they may be validated with respect to each other. The integral_range<> template uses assertions to validate the precondition that the end point is reachable by the start point, but another implementation may choose to use exceptions. Either is better than crossing your fingers with the iterators.

If you have a bent for other languages and their for/foreach/For Each constructs, you can easily emulate that with ranges:

int from = . . . ;
int to   = . . . ;
int step = . . . ; // Can be negative

for(integral_range<int> r(from, to, step); r; ++r)
{
  cout << *r; 
}

You no longer need to care about special processing of both incrementing and decrementing ranges, you get runtime validation of the range bounds and increment, and such a range can be passed to range algorithms. But this is still largely just syntactic niceties, and not convincing in and of itself. There are more persuasive cases.

Where the Range concept covers territory not covered by iterators is with callback enumeration APIs. As demonstrated in [4], without standard and efficient coroutines (fibers, to Win32 aficionados), there is no practical and efficient mechanism to adapt callback enumeration APIs to the Input (or any other!) STL Iterator concept. The only practical option is to enumerate the full range and insert the elements into a container (for example, a list or vector), and then operate on the contained elements. As discussed in [4], this can have nontrivial performance and/or data consistency ramifications. We think it's reasonable, therefore, to state that the STL Iterator concept is not compatible with callback enumeration APIs. Because C++ is unlikely to standardize corountines/fibers, this remains the case for the future.

From the perspective of dealing with the collections space in Figure 1, we can say that the STL Iterator concept covers the subset of collections that are sequences, or may be represented as sequences. This covers a large proportion of collections, albeit by strong-arming nonsequence types into behaving like sequences. Nonetheless, there remains a significant part of the collections space that iterators do not handle (non-STL containers), handle badly (notional collections), or cannot handle at all (callback enumeration APIs). Here, we explain how ranges can help you manipulate these other types of collections.

What's In a Range?

First, let's assuage any qualms that this is a complicated concept, requiring a Ph.D. in template metaprogramming and a compiler sharp enough to cut paper. Ranges are simple to use and implement and, for most of the components written so far, are compilable by a majority of commonly used compilers [5]. They're so simple that you may wonder why no one has defined them before. Indeed, you may have worked up similar treatments of some aspects of the concept yourself; both of us had independently worked with aspects of the Range concept as we now see it over several years. What we are presenting is a coherent and relatively mature definition, along with implementations in two libraries. That's not to say that we believe that the concept is 100 percent complete; we're keen to hear from readers who may see parts of an even bigger picture than the one we describe. (Before going into details, it's important to state that there is nothing about the Range concept that limits its implementation to C++. We're intending to look into its application to other languages, such as C# and C++.NET. Matthew is currently integrating ranges into the D language's standard template library, DTL.)

So how are ranges (to borrow some patterns terminology) reified in code? For languages that do not support the necessary operator overloading (or if you don't like overloading operators), range types implementing the first two categories may be defined as having these methods:

class SomeRange
{
public:
  typedef SomeRange   class_type;
  typedef SomeThing   value_type;
// Enumeration
public:
  class_type &advance();
  value_type current() const; // or value_type const&
  bool is_open() const;
  . . .
};

However, if you're comfortable with a little sugary syntactic subterfuge, then you can define operators instead:

class SomeRange
{
  . . .
// Enumeration
public:
  class_type &operator ++();
  class_type operator ++(int);   // post increment
  value_type operator *() const; // or value_type const&
  operator "bool" () const; // See reference [6]
  . . .
};

Ranges are said to be "open" or "closed." An open range can be advanced (by the advance() method or the ++ operator) and accessed (by the current() method or * operator). Attempting to advance or access closed ranges is invalid, nor can they be reopened. The state of a range is indicated by the is_open() method or the "bool" operator [6], which may be called on any range in any state.

The canonical client code looks like this method or operator form:

for(SomeRange r1; r1.is_open(); ri.advance())
{
   func(r1.current());
}
for(SomeRange r2; r2; ++r2)
{
   func(*r2);
}

To this point, we've discussed what a range looks like to the outside world, and how you can manipulate it. But what is actually in a range? It's simple. A range instance contains a snapshot of a view into a given collection. When you pass an instance to an algorithm, the copy created contains the current state of the copied range at the time of copying. The concept provides no mechanisms for accessing the original range. Consider this code, which uses STLSoft's integral_range<> class:

using stlsoft::integral_range;
integral_range<int>   r1(0, 10);
integral_range<int>   r2(++r1);

At its creation, r1 represents the asymmetric range [0, 10). r2, at the epoch of its creation, represents the asymmetric range [1, 10), because it was copied from r1 after it was advanced by the ++ operator. r2 has no notion of being constructed from something that formerly held a wider range than it was given. Although our earlier work had mechanisms for accessing the originating range values, we now find that this simplification is helpful to understanding the concept, useful for implementing ranges based on STL iterators, and necessary for meaningfully representing all range types.

Range Categories?

We have currently identified three range categories: notional, iterable, and indirect. The following category tag classes from the STLSoft range implementation indicate the relationships between them:

struct notional_range_tag
{};

struct iterable_range_tag
    : public notional_range_tag
{};

struct basic_indirect_range_tag
{};

struct indirect_range_tag
  : public basic_indirect_range_tag
{};

(The basic_indirect_range_tag is an implementation feature, rather than representing a bona fide range category.)

Range classes of a given category inherit from the requisite tag. This inheritance relationship is used to arbitrate the appropriate underlying implementation for a given algorithm.

A notional range represents a notional, rather than actual, range. An example would be the range of numbers between 0 and 99, represented in asymmetric range notation as [0, 100). This range does not "exist" in the sense that we have an array of 100 ints, with the corresponding values. Of course, we can create STL iterators that can give some semblance of reality to this range if we wished, and this may be the approach for some folks who've not yet seen the magic of ranges—but it is not necessary. With ranges, you can avoid the complexity, and simply create a class such as STLSoft's integral_range<>. (This class supports both method and operator syntax, for expository purposes. The STLSoft Range library implementation relies on the operator syntax.)

Notional range instances can only be enumerated via the advance() method or ++ operator. Given a function f, you would apply it to all the elements in a notional range as follows (this is the implementation of the r_for_each() algorithm for notional ranges):

for(; r; ++r)
{
   f(*r);
}

An iterable range holds a pair of STL iterators. There's nothing more to it, other than that iterable range types must also provide begin() and end() methods, where begin() returns the current position.

The last kind of range is the most interesting, and where the concept represents a manifest win over the Iterator concept. An indirect range is one whose elements may not be accessed in an incremental manner directly from the client code context in which the range instance is declared and/or manipulated. Rather, the elements may only be accessed indirectly by passing in functions (or function objects), which will be applied to each element in turn. Practically speaking, the indirect range maps to callback enumeration APIs.

Range Algorithms

The benefit from the abstractions of the range categories is that you're able to manipulate instances of the different range types in a common manner. Say that you're writing a multiplatform GUI and you need to dump all the available fonts into a vector<string>. On one platform, the available fonts might be read from a configuration file, and they're presented in an STL container. On an embedded device, there may be only two available fonts for that architecture known at compile time. On another, the fonts are available from the system in a callback enumeration API. Rather than having lots of #ifdefs and different code to process these things, you can write your client code once, using your gui_font_range class, whose implementation handles the platform specifics. You may then load them to your vector<string> in the following manner:

gui_font_range  fonts  = system::get_fonts();
vector<string>  names;
r_copy(fonts, std::back_inserter(names));

Pretty neat, eh? r_copy() is a range algorithm that takes a source range and an output iterator, and copies all items from the range to the iterator. Irrespective of the underlying representation, the font names are pushed onto the names vector. The sample code accompanying this article (available at http://www.cuj.com/code/) includes this example, along with three range classes demonstrating the different range type functionality.

Let's look at how the algorithms work using r_count(), which simply returns the number of elements in a range that match a given value. The implementation of this function is in Listing 1, along with some of its supporting functions. There are four points to note from this. First, the main r_count() function actually calls one of the three worker overloads, which is selected by the third parameter of the three supporting functions. This parameter is defined as a reference-to-const range tag, so the compiler's overload resolution mechanism selects the correct underlying algorithm for the given range. The compiler elides this parameter in optimized builds.

As indicated by the inheritance relationship between notional_range_tag and iterable_range_tag, an iterable range may be passed to an algorithm written for notional ranges, though the reverse, of course, is not true. Further, indirect ranges cannot be passed to algorithms written solely for notional and iterable ranges, and vice versa. Most algorithms in the library are compatible with all three types, but that need not be the case: Compatibility is defined as appropriate to a given task. Table 1 presents a compatibility matrix of the standard algorithms in the STLSoft implementation.

So What's Wrong with Ranges?

Again, we can say that there's nothing wrong with ranges, as far as they go. But devotees of the STL Iterator concept may see some limitations. First, the concept per se does not provide access to individual elements. Take, for an example, the C++ Standard Library algorithm find(), which is declared as:

template<typename I, typename V>
I find(I first, I last, V const &value);

This algorithm takes three arguments: two iterators representing the range bounding the search and the value to search for. If the element is not found, then it returns an iterator that is equal to the last. If the element is found, the returned iterator refers to that element.

Early in the refinement of the concept, we expended sorely needed brain cells attempting to work out how to emulate all the algorithms for all categories. Despite concocting a variety of convoluted—and technologically too complex—solutions, we failed to see the obvious point: Not all STL algorithms should be emulated. Ranges are not a replacement for iterators. They are concerned not with individual elements, but with sets of elements (or filtered subsets of other sets) as a whole. So it is simply a nonissue: If you need to deal with individual elements, then iterators are the tools you need. (Having said that, however, iterable ranges, based on iterator pairs, do provide such mechanisms, which we'll deal with in the next installment, so you may have the full power of both iterators and ranges when dealing with iterable ranges only.)

The second issue is equally obvious. Ranges do not provide a mechanism for the removal of elements. You cannot remove elements from notional collections and callback enumeration APIs anyway, so this is not much of a problem. With iterable ranges, the validity of removal of the "current" element from a range would depend on the relationship between the underlying iterator and collection, which cannot be assumed valid in the general case. However, again, this is not really a flaw since removal is not the purpose of ranges. They are primarily a mechanism for accessing collections of elements in a uniform fashion. (Nonetheless, the effect of removal can be achieved with ranges by the use of filtering, which we will discuss in the next installment.)

The final problem is that indirect range types must provide the corresponding member functions for whichever algorithms they are intended to be used with; see Listing 2. As it stands, this approach betrays the STL ethos of a separation of containers from algorithms. Worry not, though; we are working on a couple of approaches to address this. Matthew is currently working on techniques to use Bolt-ins [7] (templates that derive from their primary parameterizing type to provide significant behavioral enhancement/modification) to eliminate the boilerplate code in indirect range classes. The current state of this work is very positive, with support for a wide range of compilers. We've also implemented a technique for adapting minimalist indirect range types inside algorithms. All that is required of these basic indirect range types is that they derive from the basic_indirect_range_tag and provide the template member function for_each_cancellable() (Listing 3). With this, indirect ranges become separated from algorithms, in the spirit of STL: Indirect ranges need only provide the for_each_cancellable() method, which is analogous to, and no more onerous to code than, the begin()/end() methods provided by STL containers. All the algorithms are effectively coded independently of the indirect range, as shown by this overload of the r_count_impl() functions presented in Listing 1, which simply adapts a basic indirect range instance to the (full) indirect range concept via the indirect_range_adaptor<> template.

template <typename R, typename T>
size_t r_count_impl(R r, const T &val, basic_indirect_range_tag 
  const &)
{
   return indirect_range_adaptor<R>(r).count(val);
}

The Best of Both Worlds

Since iterable ranges are able to support all the STL iterator-based behavior (including all those algorithms), you can support those behaviors in range algorithms. This is where the range-tag selection mechanism comes into its own. For all those algorithms/operations that are appropriate only to iterable ranges, we define them only for the iterable_range_tag. For example, the r_generate() algorithm has only one worker function:

template <typename R, typename F>
inline void r_generate_impl(R r, F f, iterable_range_tag const &)
{
    std::generate(r.begin(), r.end(), f);
}
template <typename R, typename F>
inline void r_generate(R r, F f)
{
    r_generate_impl(r, f, r);
}

In a very real sense, iterable ranges provide the advantages of both iterators and ranges, while sacrificing the power of neither. John's focus is on providing powerful manipulation of iterable ranges, and seeing how much power can be squeezed into as little code as possible without sacrificing discoverability. We'll take a look at the fruits of some of this effort in the next installment.

Conclusion

The STL Iterator concept represents a powerful mechanism for manipulating sequence collections, but there are types of collections for which it cannot be used, and other types for which its use is cumbersome. Further, using STL iterators outside the neat and succinct algorithm form can be messy and verbose. The Range concept ably addresses all of these issues.

The Iterator concept lets you manipulate a large but incomplete set of collections, with a large and complete set of algorithms; conversely, the Range concept lets you manipulate a complete set of collections, but with a reduced set of algorithms. As with pretty much everything is software engineering, you can't completely have your cake and eat it.

In the next article, we'll discuss the further advantage of ranges over STL iterators—providing powerful filtering and composition of iterator-based ranges in a succinct and discoverable syntax.

Acknowledgments

Thanks to Dave Brooks, Greg Peet, Thorsten Ottosen, and Walter Bright, for their customarily useful insights and criticisms.

References

[1] http://www.stlsoft.org/.

[2] http://www.rangelib.org/.

[3] See Chapter 17 of Imperfect C++, Matthew Wilson, Addison-Wesley, 2004.

[4] "Adapting Callback Enumeration APIs to the Input Iterator Concept" by Matthew Wilson (CUJ, February 2004).

[5] The STLSoft Range library is fully compatible with Borland 5.6.4, CodeWarrior 8, Comeau 4.3.3, Digital Mars 8.41, GCC 3.4, Intel 8, Visual C++ 7.0 and 7.1, and also partially compatible with other compilers and older versions of these compilers.

[6] The STLSoft ranges do not provide operator bool(), since that operator is subject to unwanted conversion and comparison operations. A treatment of this issue, along with an explanation of the safe and portable solution used in the STLSoft libraries, is given in Chapter 24 of Imperfect C++.

[7] The Bolt-in concept is described in Chapter 22 of Imperfect C++, by Matthew Wilson, Addison-Wesley, 2004.