An iterator over words is a handy tool, but an iterator over arbitrary tokens is even handier.
When I saw James M. Curran's description of his word iterator (CUJ, August 1998, p. 82), it struck me that this was just the kind of thing I needed in my current work to parse a delimited string into tokens. The only problem was that Curran's word iterator skipped whitespace to find its tokens, whereas mine would have to skip various arbitrary delimiter strings depending on the situation. This small but significant difference in requirements suggested that a more flexible iterator would be useful, one that could be adapted to use any required method of finding tokens in a string.
The Standard C++ Library achieves this kind of flexibility by assembling the required functionality from a kit of template classes, functions, and function object types (functors). Examination of Curran's WordIter class reveals that the word-finding process is neatly encapsulated in the findWord function, while the rest of the class is all iterator interface and housekeeping. I found that the findWord function could be replaced at will with the desired token-finding function, so WordIter could become the flexible iterator I was looking for.
In this article I explain how this is accomplished, and how to substitute a custom token-finder function for findWord.
Using Function Objects
Using replaceable functions usually means passing function pointers as arguments and storing them as variables. In the Standard C++ Library, the types of function arguments are usually template parameters, which allows function objects to be passed as arguments as well as function pointers. Function object types are classes that overload operator() so they can be used by name, with function syntax. For example:
class Functor { public: Functor(int arg) : data(arg) {} int operator() (int arg1, int arg2) { return (arg1 + arg2) - data; } private: int data; }; Functor foo(5); // Construct foo // Call foo::operator() int bar = foo(1, 2);As class objects, functors have an advantage over ordinary functions. They can maintain non-static state data (as member variables) between invocations. This means that functors can be initialized with data, which will be available whenever the functors are subsequently used. The function-object model is ideal for the token finder function, allowing it to be initialized with the required delimiter string before repeated use in the token-searching process. Clearly, the iterator over words now becomes a specialization of the generic iterator over tokens, with a word-finding function object as its argument.
Implementation
The conversion of WordIter to a generic string iterator is straightforward. The WordIter class becomes a template class (which I have renamed TokenIter), which takes the template argument TokenFinder. The TokenFinder function object (or function pointer) is stored by value in the private member variable findToken. TokenFinder must change the TokenIter start, end, and length variables as did the original findWord function in the WordIterator. As TokenFinder is not a member of the iterator class, I have changed the function signature to take the TokenIter start and end pointers by reference so they can be updated, and to return the length of the token.
As a result, all variations on TokenFinder for use with TokenIter must overload operator() in the following manner:
int operator() (const char*& start, const char*& end);On entering this function, end points to the beginning of the string, and start is undefined. On return, if a token is found, start must point to the first character of the token, and end must point to the first character past the end of the token. If no token is found, both should point to the terminating null. The function must return the length of the token pointed to by start, or zero if no token is found.
Standard Library Function Objects
When implementing function objects, for use with the Standard C++ Library, that take one or two arguments (known as unary and binary functions respectively), it is good practice to derive them from the corresponding Standard C++ Library template structs unary_function and binary_function. These structs use their template arguments to define standard typenames for the operator() function arguments and return value, thus allowing both member functions and external code to refer to these types transparently:
class Functor : public binary_function<string, int, bool> { public: // function bool operator() // (string arg1, int arg1) // may be declared as: result_type operator() (first_argument_type arg1, second_argument_type arg2); };In this particular case, I decided not to derive TokenFinder from binary_function because its function signature is expressly coupled with TokenIter, and it is not likely to be used with Standard C++ Library functions or other algorithms.
Listing 1, TokenIter.h, shows the interface for the TokenIter class. Listing 2, TokenFinder.h, shows the interface for the TokenFinder class. Listing 3, worditer.h, shows how the original WordIter class now becomes a specialization of the TokenIter class. I have declared a WordIter typedef to show how code using the old WordIter class can be switched to the the new version without change (although this is not strictly necessary, as the WordFinder function object can be used directly with TokenIter).
Listing 4, TokenIterTest.cpp, is a test stub for the generic TokenIter class and WordIter.
Dave Lorde is a Senior Programmer/Analyst developing client-server financial applications for Financial Times Information in London, England. He has been programming for the past 16 years, the last six using C++ with Microsoft Windows and MFC. He can be reached at dlorde@cix.compulink.co.uk or david.lorde@ft.com. The views expressed above are those of the author alone and do not necessarily reflect the views of Financial Times Information Limited.