Ranges, Part 2: Iterable Range Adaptors, Algorithms, and Composition

C/C++ Users Journal December, 2004

Making code compact, easier to understand, and less error prone

By John Torjo and Matthew Wilson

John Torjo is a freelancer and consultant who specializes in C++, generic programming, and streams. He can be reached at john@torjo.com. Matthew Wilson is a software development consultant for Synesis Software and creator of the STLSoft libraries. He is author of the book Imperfect C++ (Addison-Wesley, 2004). Matthew can be contacted via http://imperfectcplusplus.com/.

STL introduced a wonderfully novel idea—separating algorithms from the containers. Thus, iterators were born, and yes, they have proven to be of great use. However, if you try to use them in nontrivial cases such as iterator composition (say to filter a vector, transform each resulting element, and then use it), they become hard to use. Moreover, when you use iterators in algorithms, you always have to specify some_array.begin() and some_array.end(). This can be tedious and makes code much less readable. Furthermore, when you code manual loops, again, the code is verbose, error prone, spans over multiple lines, and is hard to read. But don't worry—ranges are here to help!

In this installment of our series on ranges, we present "Iterable Ranges"—that is, a pair of iterators. Or, if you like, a smarter std::pair<iterator, iterator>. To keep things simple, we use the term "range" from now on to denote an Iterable Range, unless stating otherwise. The classes and functions described in this article come from John's implementation based around the Boost libraries, which is exclusively concerned with giving power to the Iterable Range concept (available at http://rangelib.org/ and http://www.torjo.com/code/ranges.zip).

Range Basics

The first step in making simple things simple is dealing with manual loops. Listing 1 is a hand-coded while/for loop, as opposed to using an STL algorithm.

Since an Iterable Range holds a pair of iterators—begin and end—it can provide any overloaded operators provided by its underlying iterator, such as ++, —, +=, [], *, and ->. Hence, if an iterator provides operator +=(), the range provides it, too; if not, neither will the range, and a compile-time error occurs if you try to use it. Thus, the underlying iterator's category (input_ iterator_tag, forward_iterator_tag, and so on) is preserved. When you call such an overloaded operator, it is as if you are operating on the begin iterator. This gives you a great amount of flexibility. All you need to remember is:

Ask if the range is open. Use operator bool() (or its is_open() method, if you're not using operator syntax with your ranges). This is equivalent to testing begin() != end() on the underlying iterators.
Dereference the range via operator*() (or current()). This is equivalent to *begin() or begin->something.
Advance the range. Use any of ++, —, +=, -=; equivalent to ++begin(), --begin(), and so on.

When coding manual loops, remember that ranges are easily traversed; see Listing 2.

For convenience, we have provided the begin() and end() functions (getters), and begin(new_val) and end(new_val) functions (setters); begin(new_val) and end(new_val) change the values of the begin and end iterators, respectively. You rarely need them, but there are situations where they come in handy.

crange versus irange

Since a range is a pair of iterators, it makes sense to be able to use it for both containers and iterators. In other words, ranges should let you walk through a container, but you should also, given iterators A and B, be able to walk from iterator A to iterator B (assuming B is reachable from A).

As you've seen from the listings so far, the ranges are template classes. This is to be expected since we need to be efficient and use compile-time type resolution—using a range incurs no overhead over using the iterators themselves: crange<what_goes_here> c. For what_goes_here you could wish for the iterator type, container type, or value type (what dereferencing an iterator will yield).

The value type was suggested by David Abrahams in a post to the Boost-developer mailing list [2]. It's much simpler than the first two; see Listing 3. However, compilers today are poor at optimizing such code, so such an implementation was just too slow [3]. Until C++0x [4], we choose to go with the first two options.

When constructing a range in your code, you know either the container or the iterators you want to traverse. Hence, you may use the:

irange class—irange<iterator_type> r(from, to)—if you know the iterator type.
crange class—crange<container_type> r(cont), or crange<container_type> r(from, to)—if you want a mutable (non-const) iterator through (part of) the container.
crange class—crange<const container_type> r(cont), crange<const container_type> r(from, to)—if you want a nonmutable (const) iterator through (part of) the container.

If you know the container type, then you know its iterator type(s), so you may use either irange or crange. We recommend you use crange because it's more succinct:

crange<container> instead of irange<container::iterator>.
crange<const container> rather than irange<container::const_iterator>.

Take another look at Listing 1, which is the typical use of ranges when used in manual loops.

Iterable Range Algorithms

So far, ranges make it much easier to code manual loops. That's a great advantage, but experts say you should prefer using STL algorithms when possible—they're reliable, represent a widely known and accepted form, and make code more readable [5]. So, let's adapt the STL algorithms so that they can be used directly with containers/ranges. STL algorithms have been adapted to work for containers and ranges:

Instead of saying std::some_algorithm(), use rng::some_algorithm().
Instead of passing cont.begin(), cont.end(), or range.begin(), range.end(), say cont or range.

Listing 4 presents examples to get you started, but you can do better. From experience, a lot of code tries to find an iterator (such as std::find(), std::find_if(), std::lower_bound(), and so on), tests if it's valid, and if so, uses it. Or, you find an iterator and walk from that iterator up to the end().

Iterable Range algorithms fit these scenarios like a glove. Where an STL algorithm would return an iterator, its corresponding range algorithm will return a range, from the "found" iterator up to the end. Usually this is exactly what you need; if not, keep reading—this behavior is overrideable. Listing 5 presents examples of using such range algorithms.

As mentioned in the first installment, algorithms that return an iterator are of a different breed. You use such an algorithm to:

Do something to the iterator you get in return.
Walk from that iterator to the end.
Walk from the beginning up to the iterator.

In the first two cases, you'll typically be happy with the default range algorithm behavior—you do need to test if the iterator is valid, and if so, use it (or walk to the end). In case you want the algorithm to:

Return only the iterator, and behave like the STL algorithm. Use rng::algorithm<iter>(...) instead of rng::algorithm(...).
Return the [begin, found) range (case 3.). Use rng::algorithm<from_beg>(...) instead of rng::algorithm(...).

Finally, in addition to the STL algorithms, we've provided two extra:

rng::erase(container, range), which erases the range from the container (equivalent to container.erase( range.begin(), range .end()). A useful correlation to remove algorithms (std::remove(), std::unique(), and so on).
rng::coll_find(coll, value), which finds the value in the STL-like collection and returns a range from the found value to the end.

Range Adaptors and Composition

This is all well indeed, but is it more than a syntactic ploy? We've saved the best for last—range adaptors. A range adaptor adapts an iterator class in order to be used in range algorithms and/or composed.

Iterator classes typically end up being redundant and hard to use in code, but providing a range adaptor makes them very easy to use and compact as well. Take the boost::filter_iterator example. boost:: filter_iterator creates a filter over an iterator sequence—it filters only the elements that match a certain predicate. However, if you use it in STL algorithms, you have to code a lot—compare the raw iterator version with the range adaptor (Listing 6).

Better still, you can compose such range adaptors. For example, you can filter a vector of strings—take only those that begin with a lowercase, then transform them—get each word's length, and apply a range algorithm to this result—and then copy them into another container or output them to the console. Without range adaptors, this could have gotten pretty ugly.

The library comes with a few adaptors, but you can adapt your own iterator classes. It is not very difficult (when downloading the code, take a look at code in rangelib/filter.hpp).

There are a number of range adaptors already defined: filtered, transformed, breaked, indirected, resorted, reversed, and generated. You can use each adaptor as a:

Function, returning a range adaptor that you either compose or use in a range algorithm.
Range object, which you can manually traverse. Just append _range to the aforementioned names. You'll rarely need this—just remember that internally the function returns the _range object counterpart.

More specifically:

filtered(range, predicate) filters the range, yielding another range whose elements match the given predicate (similar to boost::filter_iterator).
transformed(range, transform_func) transforms each element in the range by applying transform_func (similar to boost::transform_iterator). transform_func can be a function or a function object with a result_type typedef—the adaptor needs to know what elements the new range will contain.

Furthermore, you can use transformed(pair_1st(some_stl_collection)) or transformed(pair_2nd(some_stl_collection)), yielding the keys from the collection in the former case or the values in the latter. This is helpful when copying associative collections into arrays.

breaked(range, break_predicate) yields the original range up to the element for which break_predicate returns false; similar to a manual loop with an internal if( !pred(*iter)) break;. This is useful for manipulating only the first n elements from a range or, for example, in a sorted range of integer values, to manipulate only the negative values.
indirected(range) dereferences each element in a range. Useful when you have a container of pointers, but you want to manipulate the pointed values (similar to boost::indirect_iterator).
resorted(range [,pred]) returns a range made up of the elements from the original range, resorted, leaving the original range unchanged. This is useful when you have a complex container and need different views for it. For example, you have an array of files and you want to view it ordered by, say, name, size, attributes, or creation time. You can compose it with breaked, and get, for example, the top 10 files, sorted by size—breaked(resorted(cont,by_size), 10).

To use resorted, the iterators that make up the range need to be at least forward iterators.

reversed(range) returns the original range in reversed order (similar to std::reverse_iterator).
generated(generator, stopper) creates a range by using a generator function object. After generating an element, the stopper is asked if that element belongs to the range. The range will end when the stopper returns false. (This is similar to std::generate() and std::generate_n() algorithms. However, it's much more powerful because you can use it in any range algorithm or even compose it!)

Using the range adaptors in algorithms and composing them is straightforward; see Listing 7. Also, when you download the code, you'll see that each adaptor has a corresponding source file providing examples of its use.

Conclusion

In short, ranges are a powerful concept, and using them makes code more compact, easier to understand, and less error prone. In addition, Iterable Ranges in particular make the STL (containers and algorithms) even easier to use.

Acknowledgments

Thanks to Dave Brooks, Greg Peet, Thorsten Ottonsen, and Walter Bright for useful and challenging feedback.

References

[1] Wilson, Matthew and John Torjo. "Ranges, Part 1: Concepts and Implementation," CUJ, October 2004.

[2] http://lists.boost.org/MailArchives/boost/msg63750.php.

[3] In May this year, John wrote a test that would measure using the latter form against using raw iterators. The results were very discouraging, as the range implementation speed was sometimes less than half compared to raw iterators (http://lists.boost.org/MailArchives/boost/msg64254.php).

[4] There is talk that C++0x will enhance the auto keyword, so that it will be a placeholder for a type deduced by the compiler from the surrounding expression. This would improve generic programming, and using templated classes would become easier. You could simply write:

     // assuming get_range() is a function, 
     // returning a crange<container_type> 
     for ( auto r =get_range(cont);  r; ++r) whatever(*r);

[5] Indeed, you should generally prefer algorithms to manual loops. However, there are cases when a manually coded loop using crange<> is easier to use and understand than its rng::for_each() counterpart. This can happen when the loop needs a lot of data that is present in its (function) scope. Otherwise, you would need to pass that data to a function—the one you would call with rng::for_each(). If this hurts readability, then it's probably not worth doing the latter.