Data-Structure-Independent Algorithms for Image Processing

By Vitaly Ablavsky, Mark R. Stevens, and Joshua B. Pollak

With C++ compilers finally catching up to the ANSI/ISO Standard, generic programing has become a popular paradigm for obtaining good run-time performance while simultaneously facilitating code reuse. Generic programming emphasizes the separation of algorithms from the data structures they manipulate. Through the consistent application of generic programming principles, the Boost Graph Library (BGL) [1] achieves complete separation between graph data structures and graph algorithms. In addition, the run-time behavior of BGL algorithms is parameterized using compile-time idioms that do not adversely affect run-time performance.

Can the same success be repeated in the domain of image processing? In this article, we address this question by focusing on one axis of the diagram in Figure 1 — conformance with the STL. By conformance, we mean both striving for the STL-style look and feel, and using STL algorithms as building blocks for advanced image processing. Why is this a good idea? Because any developer versed in the Standard C++ Library and armed with a good book on image processing (such as Digital Image Processing, by Rafael C. Gonzalez and Richard E. Woods, Addison-Wesley, 2002) can understand and extend such code without significant trouble. However, you need to overcome two hurdles to do this:

All STL algorithms assume the data is linear (one dimensional) and can therefore be abstracted into a sequence of numbers. By contrast, images are two-dimensional entities.
The STL assumes that iterators are lightweight (glorified pointers) so they can be passed by value inside tight inner loops without incurring a significant run-time copy penalty.

Locating a useful image-processing library is easy — there are dozens of them, with many offering real-time performance. The gotcha is that clients are required to use library-defined data structures because the optimized routines of the library are written against that interface. For example:

namespace proprietary { 

  class Image { //... }; 

  void adjustContrast(const Image& in, Image& out) { //... }

}

Often, the Image class is just a wrapper to a data pointer. If you have to switch from one library to another, all custom algorithms written against proprietary::Image must be reimplemented.

The VIGRA image-processing library [2] demonstrates the use of generic programming techniques to remove the dependence between the image representation and the processing. VIGRA extends the STL design philosophy by introducing new collections of operators and iterators for spatial data. For instance, VIGRA's transform2D applies a user-defined functor to every element of an image. While the resulting separation of data structures and algorithms is a plus, the reliance on a custom STL-like infrastructure is unsatisfactory. Consequently, you need to remove the need for such a framework whenever possible (see the right portion of Figure 1). The test for success is the ability to rely instead on std::transform.

To help motivate the type of code you wish to write, we examine a traditional image-processing algorithm known as "median filtering" (see sidebar entitled "Image Processing"). If asked to generate a median-filtering operation, your first attempt may produce something along the lines of Listing 1. Here, there are four nested loops, with the outer two visiting each pixel in the image, and the two inner loops visiting each pixel in the sliding window neighborhood. Not only are the for loops unnecessary, the approach is not scalable. In medical imaging (CTs and MRIs), three-dimensional images lead to six nested for loops!

A closer look at how sliding windows access image data may lead to a better implementation. First, the two outer loops are stepping over each pixel in the image, and could be replaced with a conceptual for all pixels loop (for example, std::transform). At each pixel, the sliding window is formed based on the underlying spatial data, but then the data is treated as linear (in fact, it is copied into a linear array). Here lies the key observation: Most image processing requires spatial access to the image data, but that spatial access can be unrolled into a series of linear neighborhoods. Listing 2, which is based on this understanding, has better STL conformance than the previous implementation, making it more readable.

The question remains: How do you design both the window and subwindow iterators along with any other iterator that may be needed by an image-processing algorithm?

Iterator Design

The first step in the iterator design is to determine the concepts that image-processing iterators must support. We propose three new iterator categories that coexist with the existing STL iterator tag hierarchy (Figure 4):

Sequential: Process all pixels independently regardless of position in the image.
Subwindow: For all image subwindows, process all pixels in the subwindow sequentially.
Spatial: x- and y-coordinate aware.

In many cases, sequential access requires the same iterator concepts provided by a forward iterator (*,++,!=) with the underlying assumption that the spatial image grid is laid out as a series of contiguous scan lines in memory. If the image fits perfectly in memory, forward iterators suffice. However, if each scan line has additional constraints, then the iterator must skip invalid image data during processing.

For instance, cameras may pad the boundary of images if the requested frame size is not supported in hardware. Finding the min element in the image (a seemingly linear operation) would return garbage regardless of the image contents if padding is not respected. Another example would be a camera returning an RGB image in a 16-bit format. The iterator might need to perform bit shifting to access the appropriate R, G, and B channels. In both cases, the sequential iterator could hide these gory details from you.

To implement sequential image iterators, you need knowledge of the image size and padding. We won't explicitly show code for such iterators because the same technique used to implement the subwindow could be adapted for sequential iteration. Likewise for spatial-image iterators, VIGRA has demonstrated an excellent approach to spatial (random access) image iterators via ++it.x and ++it.y.

For sliding window processing, you need two iterators. The first, the window iterator, is passed to the std::transform and allows iteration over all subwindows in the image. This iterator must step over each pixel consecutively but remain aware of the current x and y position in the image grid. For efficiency and ease of construction, both the subwindow and window iterators are embedded within a container class. This container stores both the beginning/ending iterators along with helper methods to access and set the internal state; see Listing 3. Another approach is that of the Boost library, which aggregates the begin/end iterators in an std::pair.

The window is constructed with a pointer to the image data and the image dimensions. This allows ease-of-use with existing image class and data representations. In fact, all that is required to use iterators is a pointer to the data and the spatial dimensions. We have explored methods where the window iterator actually wraps a ForwardIterator. For demonstration purposes, we show the simpler design (passing a pointer to data), but this design works equally well.

To reduce the number of times a subwindow is constructed, window iterators contain a subwindow class and return this class by reference. Without this internal storage, subwindows would be constructed twice for each pixel (for the beginning and ending of each subwindow). The preincrement operator updates both the x and y positions in the image and requires an extra bit of logic to wrap the values at the end of the scan line.

Listing 4 presents a sample subwindow iterator implementation. Since the subwindow iterator could possibly iterate beyond the boundaries of the image, you must check boundary conditions during dereferencing. In this example, the iterator zero pads the image, which makes the dereferencing iterator slightly more complicated. Also, the preincrement operator must make sure the spatial location properly wraps when the subwindow boundary is reached.

While the code presented here deals with gray-scale imagery, we've extended the technique to multiband images (RGB, RGBA, and so forth). The extension to support multiband images is fairly straightforward, but it requires a few additional checks during both the * and ++ operators.

Making Our New Iterators Practical

The design of a new iterator types leads to the analysis of subtle issues with state persistence. The traditional STL iterator hierarchy is geared toward simple, linear data structures and lightweight iterator objects. Our subwindow iterator is clearly not lightweight (in fact, it is 64 bytes, which must be copies twice per pixel for median filtering). A quick profile of the median filter example showed std::copy dominating as the most expensive operation.

To compensate, we extend the hierarchy to deal with light- and heavyweight iterators. The heavyweight category is handled by introducing a proxy_iterator that simply wraps the heavyweight iterator and passes all iterator operations (*, ++, !=) along to the iterator they wrap. This allows only a single pointer copy when interfacing with STL operations (like copy). Adding the proxy iteration greatly reduces computational costs.

Start Making Sense

Now you have them all: Standard Library iterators, VIGRA iterators, sequential iterators, heavyweight iterators. Do all these entities have nothing more in common than an operator++? How can algorithm developers write the most efficient code given the iterator concept? To answer these questions, revisit the iterator tag hierarchy. Remember, the iterator tag hierarchy is not an inheritance diagram, but a relationship between concepts.

In the new world view, both the sequential and sliding window iterators belong to the sequential_iterator category. If you were to implement an iterator that traverses an image along diagonal scan lines, it would still implement a sequential_iterator category. That is what sequential means! In contrast, if you absolutely need to visit arbitrarily ordered x-y locations in an image, your iterator is truly of the spatial_iterator category. If you have a spatial_iterator, you can go anywhere in the image, but you have to do it yourself because the Standard Library does not support multidimensional index increment. That is the price of spatial awareness!

The tag dispatching idiom lets you write functions tailored to the concept provided by the iterators. For example, if you do need to hop around the image like a knight on a chessboard, you have to remember that linear iterators do not carry enough state information for row/column offset calculations. If you are stuck with a linear iterator, you must pass enough information about the image inside your functor to do the work of a spatial iterator yourself. By the way, if your algorithm requires that you visit pixels in a chess knight pattern, you are again relying on the linear_iterator concept! A spatial_iterator can easily produce a linear_iterator that provides a single operator++ by internally manipulating move.x and move.y.

Another use for tags is to perform compile-time branching to use iterator proxies when running a windowing function with a heavyweight iterator. Writing iterator-category specific code yields substantial performance benefits.

From the big-picture standpoint, you're dealing with a single fundamental concept — iterators that produce subiterators for examining local neighborhoods. This concept is not unique to image processing. Graph-traversal Boost style relies on two critical functions — vertices(graph) and edges(vertex, graph). The first function is analogous to the Window iterator, while the second is analogous to the SubWindow iterator. A previous article [3] mentioned using BGL's implementation of Dijkstra's single-source shortest-path algorithm for mobility analysis in digital elevation maps. But a digital elevation map is a two-dimensional array of data. Whether you want to median filter it or run Dijkstra, your neighborhood structure and iterators are exactly the same!

Performance Analysis

Our focus is on writing modern/maintainable C++ code for image processing. This approach is not intended to execute as fast as hand-optimized or CPU-specific code. Yet this method stacks up favorably against standard alternatives like nested for loops, while also reducing the amount of coding required and improving code maintainability.

For a simple analysis, we implemented three separate image-processing algorithms: median filtering, generalized convolution, and gamma correction. For each of these three different algorithms, we constructed three implementations: nominal four nested for loops, subwindow iterators (if appropriate), and subwindow iterators using dispatch to a proxy iterator implementation (to reduce STL overhead). Table 1 shows some simple performance metrics over computation time and code written. Far less code is written using the window iterator methods (due to the iteration being hidden within the iterators). The proxy method requires extra code to handle the compile time dispatching, but greatly decreases total run time.

New Directions

You can construct "irregular" kernels with just a few standard building blocks: std::pairs and templates:

struct Offset

{ 

  Offset(short dx, short dy);

  short dx_;

  short dy_;

};

struct NullOffset { }; // end-of neighborhood marker

// create a four-element neighborhood

typedef std::pair<Offset,std::pair<Offset,

  std::pair<Offset, std::pair<Offset, NullOffset> > > > Ngb4;

With just a few lines you can define a neighborhood corresponding to a cross pattern. Iterating over such a neighborhood is unfolded into a sequence of statements at compile time — a desirable byproduct.

Conclusion

The key to implementing suites of iterators that allow sliding window iteration over generic image data structures is to use as much of the STL as possible for rapidly prototyping image-processing algorithms. Listing 5, which illustrates the power of these iterators, shows two additional operations — generalized convolution and morphological erosion. These two examples total 20 lines of C++ code. Using these iterators to implement a sliding window image-processing operation amounts to implementing a functor to process a single returned subwindow.

References

[1] Jeremy G. Siek, Lie-Quan Lee, and Andrew Lumsdaine. The Boost Graph Library, Addison-Wesley, 2001, http://www.boost.org/.

[2] Ullrich Köthe, "STL-Style Generic Programming with Images," C++ Report, January 2000, http://kogs-www.informatik.uni-hamburg.de/~koethe/vigra/.

[3] Vitaly Ablavsky, "Applying BGL to Computational Geometry," C/C++ Users Journal, August 2002.

Acknowledgments

Portions of this work were inspired by the Vision Toolkit project at Charles River Analytics. We would like the thank our codevelopers Dan Stouch, Jonah McBride, Magnùs Snorrason, and Thom Goodsell.

Vitaly Ablavsky is a senior software engineer at Charles River Analytics in Cambridge, MA. His interests are computer vision, pattern recognition, and algorithm development in C++. He can be contacted at vablavsky@cra.com. Mark R. Stevens is a senior scientist for Charles River Analytics, specializing in the development of computer vision algorithms in C++. He has been writiting image processing algorithms in C++ for over 10 years. He can be reached at mstevens@cra.com. Joshua B. Pollak is a software engineer for Charles River Analytics, where he develops computer vision applications in C++. He has been working with computer vision software for 4 years and C++ for 8. He can be reached at jpollak@cra.com.