Ranges, Part 2: Iterable Range Adaptors, Algorithms, and Composition

C/C++ Users Journal December, 2004

Making code compact, easier to understand, and less error prone

By John Torjo and Matthew Wilson

John Torjo is a freelancer and consultant who specializes in C++, generic programming, and streams. He can be reached at john@torjo.com. Matthew Wilson is a software development consultant for Synesis Software and creator of the STLSoft libraries. He is author of the book Imperfect C++ (Addison-Wesley, 2004). Matthew can be contacted via http://imperfectcplusplus.com/.

STL introduced a wonderfully novel idea—separating algorithms from the containers. Thus, iterators were born, and yes, they have proven to be of great use. However, if you try to use them in nontrivial cases such as iterator composition (say to filter a vector, transform each resulting element, and then use it), they become hard to use. Moreover, when you use iterators in algorithms, you always have to specify some_array.begin() and some_array.end(). This can be tedious and makes code much less readable. Furthermore, when you code manual loops, again, the code is verbose, error prone, spans over multiple lines, and is hard to read. But don't worry—ranges are here to help!

In this installment of our series on ranges, we present "Iterable Ranges"—that is, a pair of iterators. Or, if you like, a smarter std::pair<iterator, iterator>. To keep things simple, we use the term "range" from now on to denote an Iterable Range, unless stating otherwise. The classes and functions described in this article come from John's implementation based around the Boost libraries, which is exclusively concerned with giving power to the Iterable Range concept (available at http://rangelib.org/ and http://www.torjo.com/code/ranges.zip).

Range Basics

The first step in making simple things simple is dealing with manual loops. Listing 1 is a hand-coded while/for loop, as opposed to using an STL algorithm.

Since an Iterable Range holds a pair of iterators—begin and end—it can provide any overloaded operators provided by its underlying iterator, such as ++, —, +=, [], *, and ->. Hence, if an iterator provides operator +=(), the range provides it, too; if not, neither will the range, and a compile-time error occurs if you try to use it. Thus, the underlying iterator's category (input_ iterator_tag, forward_iterator_tag, and so on) is preserved. When you call such an overloaded operator, it is as if you are operating on the begin iterator. This gives you a great amount of flexibility. All you need to remember is:

When coding manual loops, remember that ranges are easily traversed; see Listing 2.

For convenience, we have provided the begin() and end() functions (getters), and begin(new_val) and end(new_val) functions (setters); begin(new_val) and end(new_val) change the values of the begin and end iterators, respectively. You rarely need them, but there are situations where they come in handy.

crange versus irange

Since a range is a pair of iterators, it makes sense to be able to use it for both containers and iterators. In other words, ranges should let you walk through a container, but you should also, given iterators A and B, be able to walk from iterator A to iterator B (assuming B is reachable from A).

As you've seen from the listings so far, the ranges are template classes. This is to be expected since we need to be efficient and use compile-time type resolution—using a range incurs no overhead over using the iterators themselves: crange<what_goes_here> c. For what_goes_here you could wish for the iterator type, container type, or value type (what dereferencing an iterator will yield).

The value type was suggested by David Abrahams in a post to the Boost-developer mailing list [2]. It's much simpler than the first two; see Listing 3. However, compilers today are poor at optimizing such code, so such an implementation was just too slow [3]. Until C++0x [4], we choose to go with the first two options.

When constructing a range in your code, you know either the container or the iterators you want to traverse. Hence, you may use the:

If you know the container type, then you know its iterator type(s), so you may use either irange or crange. We recommend you use crange because it's more succinct:

Take another look at Listing 1, which is the typical use of ranges when used in manual loops.

Iterable Range Algorithms

So far, ranges make it much easier to code manual loops. That's a great advantage, but experts say you should prefer using STL algorithms when possible—they're reliable, represent a widely known and accepted form, and make code more readable [5]. So, let's adapt the STL algorithms so that they can be used directly with containers/ranges. STL algorithms have been adapted to work for containers and ranges:

Listing 4 presents examples to get you started, but you can do better. From experience, a lot of code tries to find an iterator (such as std::find(), std::find_if(), std::lower_bound(), and so on), tests if it's valid, and if so, uses it. Or, you find an iterator and walk from that iterator up to the end().

Iterable Range algorithms fit these scenarios like a glove. Where an STL algorithm would return an iterator, its corresponding range algorithm will return a range, from the "found" iterator up to the end. Usually this is exactly what you need; if not, keep reading—this behavior is overrideable. Listing 5 presents examples of using such range algorithms.

As mentioned in the first installment, algorithms that return an iterator are of a different breed. You use such an algorithm to:

In the first two cases, you'll typically be happy with the default range algorithm behavior—you do need to test if the iterator is valid, and if so, use it (or walk to the end). In case you want the algorithm to:

Finally, in addition to the STL algorithms, we've provided two extra:

Range Adaptors and Composition

This is all well indeed, but is it more than a syntactic ploy? We've saved the best for last—range adaptors. A range adaptor adapts an iterator class in order to be used in range algorithms and/or composed.

Iterator classes typically end up being redundant and hard to use in code, but providing a range adaptor makes them very easy to use and compact as well. Take the boost::filter_iterator example. boost:: filter_iterator creates a filter over an iterator sequence—it filters only the elements that match a certain predicate. However, if you use it in STL algorithms, you have to code a lot—compare the raw iterator version with the range adaptor (Listing 6).

Better still, you can compose such range adaptors. For example, you can filter a vector of strings—take only those that begin with a lowercase, then transform them—get each word's length, and apply a range algorithm to this result—and then copy them into another container or output them to the console. Without range adaptors, this could have gotten pretty ugly.

The library comes with a few adaptors, but you can adapt your own iterator classes. It is not very difficult (when downloading the code, take a look at code in rangelib/filter.hpp).

There are a number of range adaptors already defined: filtered, transformed, breaked, indirected, resorted, reversed, and generated. You can use each adaptor as a:

More specifically:

Furthermore, you can use transformed(pair_1st(some_stl_collection)) or transformed(pair_2nd(some_stl_collection)), yielding the keys from the collection in the former case or the values in the latter. This is helpful when copying associative collections into arrays.

To use resorted, the iterators that make up the range need to be at least forward iterators.

Using the range adaptors in algorithms and composing them is straightforward; see Listing 7. Also, when you download the code, you'll see that each adaptor has a corresponding source file providing examples of its use.

Conclusion

In short, ranges are a powerful concept, and using them makes code more compact, easier to understand, and less error prone. In addition, Iterable Ranges in particular make the STL (containers and algorithms) even easier to use.

Acknowledgments

Thanks to Dave Brooks, Greg Peet, Thorsten Ottonsen, and Walter Bright for useful and challenging feedback.

References

[1] Wilson, Matthew and John Torjo. "Ranges, Part 1: Concepts and Implementation," CUJ, October 2004.

[2] http://lists.boost.org/MailArchives/boost/msg63750.php.

[3] In May this year, John wrote a test that would measure using the latter form against using raw iterators. The results were very discouraging, as the range implementation speed was sometimes less than half compared to raw iterators (http://lists.boost.org/MailArchives/boost/msg64254.php).

[4] There is talk that C++0x will enhance the auto keyword, so that it will be a placeholder for a type deduced by the compiler from the surrounding expression. This would improve generic programming, and using templated classes would become easier. You could simply write:

     // assuming get_range() is a function, 
     // returning a crange<container_type> 
     for ( auto r =get_range(cont);  r; ++r) whatever(*r);

[5] Indeed, you should generally prefer algorithms to manual loops. However, there are cases when a manually coded loop using crange<> is easier to use and understand than its rng::for_each() counterpart. This can happen when the loop needs a lot of data that is present in its (function) scope. Otherwise, you would need to pass that data to a function—the one you would call with rng::for_each(). If this hurts readability, then it's probably not worth doing the latter.