Revisiting UNIX Filters

C/C++ Users Journal May, 2005

Redirecting programs and creating pipelines

By Christopher Diggins


Welcome to the first installment of "Agile C++." I chose the name Agile C++ because it evokes my intention to introduce new ways to use C++ as a high-level language, not because it is specifically about agile software development. C++ has an enormous untapped potential for rapid application development, which I will explore in this and future articles.

UNIX Filters

I've long observed that a significant percentage of programs (or parts of programs) behave like UNIX filters. In other words, they read from the standard input stream (std::cin) and write to the standard output stream (std::cout). This approach is particularly popular on UNIX systems because most UNIX users know how to use the shell to redirect the input and output of programs. When you redirect standard input/output of one or more programs, whether in a script or on the command line, you are essentially creating a new program (albeit a potentially temporary one) that reuses the original(s). This is perhaps the most effective example of code reuse you can find.

Redirecting programs and creating pipelines can be done in C++ on certain operating systems by using the exec family of functions. For instance, given a program to_upper.exe that echoes the standard input while changing lowercase letters to uppercase, you can write the program in Listing 1. This is an interesting technique, but it has several disadvantages that make it impractical in most applications:

This raises the question, if you have the source to a filter written in C++, why can't you reuse it in another program, as is? Actually, you can. Both basic_istream and basic_ostream derive from the class template basic_ios, which has a member function rdbuf():

streambuf* rdbuf ( ) const;
    streambuf* rdbuf ( streambuf* sb );

The first form of the function rdbuf returns the streambuf object, while the second form lets you change the streambuf object and returns the old streambuf object. This is all you need to redirect the standard streams.

Now consider the source of the naïve implementation of to_upper.exe in Listing 2. As it stands, this code can't be reused; you can, however, trivially rewrite the code and break it up into two files. The first is the header file to_upper.hpp in Listing 3(a), then the simple cpp file to_upper.cpp in Listing 3(b).

Having done that, you can now reuse to_upper.hpp as a subprogram in other programs by redirecting standard input or standard output. For instance, Listing 4 behaves precisely like the original example. The implementation is entirely found in the header file, input_redirection_demo.hpp (Listing 5).

There are a few caveats that need to be considered when reusing programs in this manner. The most important of which is that this technique works only if the subprogram accesses its standard input and output solely via std:cout and std::cin, and furthermore, that it must use these streams only during the execution of main.

Notice that I placed the implementation for input_redirection_demo.cpp in a header file and followed the necessary good practices outlined. Consequently, you can now reuse it, without modification, by including the header input_redirection_demo.hpp. See Listing 6.

Syntactic Sugar

For a technique to be useful, it must be as easy to use as possible. I also want to make this technique as accessible as you might expect from a very high-level language, such as Python or Perl. To that end, I've written a filter library as part of the Object-Oriented Template Library (OOTL) [1], which overloads the greater-than operator (>) to allow the redirection of streams. Every function with one of the common main signatures: int main(int, char*), int main(), or the nonconformant (but all too common) void main(), can be wrapped using a Filter object. This lets you write the previous to_upper program; as in Listing 7.

You may want to write more sophisticated filter objects, so that they can have a persistent state. To accommodate this, the redirection operator can be applied to any object that derives from AbstractFilter. The new class must simply provide a public implementation of the abstract function:

virtual int Impl() = 0;

Pipelines

The convenience of the syntax becomes more apparent when you want to construct pipelines where output from one filter is input into another. For this task, the OOTL further overloads the greater-than operator (>). This purposefully breaks from the tradition of UNIX shell scripts. UNIX shell scripts required three separate operators in order to understand how to interpret the different identifiers, but in a strongly typed language such as C++, the types of the identifiers make the interpretation of statements unambiguous. Consider Listing 8.

When using two filters side-by-side in a pipeline, you have to be aware that there is no multithreading occurring, and as a result, the output of one filter is fully buffered before being sent to the next one. The advantage of this is simplicity. The disadvantage is that you run the risk of memory exhaustion. One imperfect workaround is to create a temporary file and use it to buffer the output. Another option is to use the Boost Iostreams library.

The Boost Iostreams Library

The Boost Iostreams library [2], which has been accepted for inclusion in Boost, is expected to be part of the next major release. It provides a general framework for defining Standard-conforming iostreams and attaching chains of filters to them. For example, a to_upper_filter could be defined as in Listing 9 using the Boost Iostreams library. Listing 10 is one example of using such a filter. The function get() in the filter is defined as a member template. In more complex examples, it may be necessary to define helper functions that are themselves member templates. The resulting syntactic complexity can make code hard to manage and understand.

Conclusion

I've found that the technique of subprogram input/output redirection has come in handy when engaging in test-driven development. It's saved me a lot of time while developing test and demonstration programs for the OOTL. This is the beginning of my push toward harnessing the more agile possibilities of C++.

Acknowledgments

Special thanks to Jonathan Turkanis for taking the time to carefully review this article and for his generous and significant contributions to the open-source C++ development community.

References

  1. [1] http://www.ootl.org/.
  2. [2] The most recent version is available in the Boost CVS repository (http://www.boost.org/). An earlier version is available at http://www.kangaroologic.com/iostreams/.