Algorithms


A Template-Based Network Implementation

John Ruark

An STL container need not be a simple sequence. It's all in the iterators.


Quick: You need to get to the local MegaComp store to buy the latest, greatest development tool, Sensual C. What's the quickest way to get there? What's the shortest possible path from your machine to the store? Suppose you have a day job as an earth-moving engineer, and your primary task is to level hilly terrain to provide foundations for bypasses. What's the least expensive way to transfer earth from the hills into the valleys and make a flat surface?

These and many other problems can be approximated or solved with network-flow algorithms. These algorithms operate on a collection of elements that are connected to each other in pairs. The elements are called nodes, and the connections between nodes are called arcs. Arcs are directed; they extend from a source node to a destination node. A network comprises the collection of nodes and the collection of arcs. Because of their complexity, networks are powerful modeling devices. Applications range from planning optimal delivery routes for distribution systems to capacity planning for intranets to optimal assignments of students to their classes. In this article I describe a versatile network implementation based on templatized arc and node types.

Why Templates?

There are common classes of network-flow problems, and specific network problems usually can be approximated by one of these classes. Common classes of problems include the shortest path, the maximum flow, the minimum-cost flow, the assignment, and the traveling-salesperson problems. The shortest-path problem is to find the shortest path along the arcs from one node to another. In contrast, the goal of the traveling salesperson problem is to find a round-trip route that visits all of the nodes once and only once, in one non-disjoint cycle with the lowest overall distance for the cycle.

Each network-flow problem requires a different set of inputs for the arcs and nodes. Aside from the source and target nodes for the path, the shortest-path problem requires only the length of each arc. The minimum cost flow problem requires constraints on the amount of flow per arc, the supply or demand of each node, and the cost per unit of flow on each arc. So, in very large problem instances (several hundred thousand arcs), it becomes inefficient to have a single network implementation that attempts to fit the needs of all network-flow problems.

However, all networks exhibit similar characteristics. As a container, one network differs from another only in the quantity and type of data stored at each node and each arc. (Separate problems such as shortest path and maximum flow have inputs that are instance specific, such as source and target nodes; these are not part of the network itself.) Furthermore, all networks must support a certain amount of common behavior. This duality — common behavior with different data — makes a network implementation via C++ templates ideal. Therefore, I have created a network template class that mimics the container classes in STL. I have also developed two shortest-path algorithms to demonstrate the versatility of the template mechanism.

The Network Representation

Any network implementation must capture the behavior common to all networks:

There are several ways to store a network, just as there are several ways to store a sequence of elements. Consider that to store a sequence of elements you can use one of many container classes, such as vector, list, set, and deque. Each container implements storage in a different way, and each has its advantages and disadvantages for particular circumstances. The same is true of networks. Some common network representations are: node-arc adjacency matrix, node-node adjacency matrix, arc-adjacency lists, and forward-star representation. The matrix-based representations provide quick access to nodes and arcs, but are space hogs. The other two representations are compact, and they provide constant look-up time for nodes and linear look-up times for arcs. My implementation uses an arc-adjacency list.

A sample network appears in Figure 1; the arc-adjacency representation for this network appears in Figure 2. The essence of the arc-adjacency representation is simple. The nodes are stored in a container, usually one that provides quick retrieval by index (such as a deque or vector). With each node is stored a collection of all the arcs that have that node as their source; this container is usually one that provides quick insertions and removals (such as a list). I choose these performance criteria for two reasons: 1) any interesting network will contain more arcs than nodes. 2) Many of the network algorithms will need to add and remove temporary computational arcs, but very few algorithms will add or remove any temporary nodes.

Network Implementation

Listing 1 shows the skeleton of the network template class implementation. (The full implementation is available on the CUJ ftp site. See p. 3 for downloading instructions.) The network template class requires three template parameters.

template<
  class _NODE,
  class _ARC,
  class A = allocator<_NODE> >
 class network;

The first argument specifies the type of objects that will be stored as nodes, the second specifies type for arcs, and the third is the allocator for the nodes. The network contains three member variables. The first, _alloc, is a copy of the allocator provided when the network is constructed. The second, node_list, is a vector that stores the nodes in the network, and the third, arc_list, is a list that stores the arcs. Because of the special relationships between nodes and arcs in a network, it is not sufficient to simply store the objects of type _NODE and _ARC in these containers. As with the template class list (see "Standard C/C++: The Header <list>," CUJ, February 1997), the network class must store internal structures that contain the information the network needs to maintain those relationships.

As shown in Figure 2, the internal arc structure must hold the _ARC class, which is the data from the client's perspective, and the destination node of that arc. While it is possible to determine the source node given an iterator to the internal arc structure, I store the source node for that arc as well. This is a conscious tradeoff of redundancy for improved access times; in ultra-large networks I might remove the source node and instead use code logic to compute the source node from the iterator. The internal arc structure appears at the top of Listing 1. The data members are as follows:

struct _iArc
{
    // ...    
    _ARC data;
    size_type source_node;
    size_type destin_node;
};

To avoid circular dependencies, this structure uses a size_type to refer to the nodes, as opposed to a node iterator. This is possible because the node container will support direct lookups with size_type indices. The operators are declared but never defined; they keep some compilers (such as Microsoft Visual C++ 5.0) happy.

Even though the representation in Figure 2 calls for one list of arcs for each node, this implementation stores all of the arcs in a single list. This decreases the total storage requirements of the network and makes iterating over every arc easier. Single-list storage is illustrated in Figure 3. Instead of each node storing its own list, the network stores an iterator to the single list, bound to the arc that is the first arc leaving that node. The arc list is defined in the network class as:

typedef list<
    _iArc,
    arc_allocator_type > ARCLIST;
ARCLIST arc_list;

With the arcs defined, I can now define the nodes:

typedef ARCLIST::iterator _arcIt;
struct _iNode
{
    // ...

    _NODE data;
    _arcIt firstArcOut;
};

typedef vector<
    _iNode,
    node_allocator_type > NODELIST;
NODELIST node_list;

The firstArcOut member stores the iterator to the first arc in the chain of arcs with this node as their source. These iterators are internal however, because they provide access only to the internal structures. So, the network class needs to define iterators for the client to use as well.

The Joy of Iteration

Much of the network class is dedicated to defining the various supported iterators. The network class exposes eight iterator types to clients. The node and arc iterators come in const and non-const versions, as well as forward and reverse. The network class completely defines the four forward iterators; it defines the reverse iterators through the reverse_bidirectional_iterator template.

I modeled the forward iterators after the STL vector implementation. I implement the node iterators as an an embedded class node_iterator and a const-version as an embedded class const_node_iterator.

In the vector implementation, the contained variable used for iterating is usually a pointer to the contained data element. In node_iterator, the contained variable is an iterator itself (a NODELIST::iterator). Client iteration mainly delegates to the internal iterator, as in operator++() below:

template<class _NODE, class _ARC, class _A>
class network
{
// ...
  class node_iterator :
    public _Bidit
      <node_value_type,
       node_difference_type>
  {

// ...

    node_iterator& operator++()
      { ++_Ptr;
        return (*this);}
  };
};

When clients dereference a node_iterator, they want the underlying _NODE, which is excavated via the internal NODELIST::iterator and the internal node structure _iNode.

The network class provides iterators to the beginning and ending of the node list as begin_nodes(), end_nodes(), rbegin_nodes(), and rend_nodes(), all of which delegate to the internal node_list list. These iterators follow the semantics of the STL iterators begin(), end(), rbegin(), and rend(). Note, for example, that end_nodes() points to the hypothetical position just past the last node, just as the STL's end iterator points just past the contained sequence. In general, any function that works with nodes has _nodes appended to its name, while a function that works with arcs has _arcs appended.

To implement arcs, I followed closely the implementation in the list header that ships with Microsoft's compilers. In particular, I provide an arc_iterator class and a const_arc_iterator class that derives from arc_iterator. Here again, the implementation of these iterators delegates to the underlying list iterators. The iteration process is much more interesting with arcs than with a list, owing to the need for iteration over subsets of arcs. Again, I provide the standard-like functions begin_arcs(), end_arcs(), rbegin_arcs(), and rend_arcs(), each of which takes no parameters. With the help of these functions it is possible to iterate over every arc in the network. However, I also overload these functions to take a node_iterator or a node index, which lets you iterate over those arcs that have the referenced node as their source.

The beginning of the arc list for a given node is just the firstArcOut member of the _iNode structure for the desired node. Finding the end of the list is trickier, and now is the time to address the issue of what the firstArcOut member refers to when there are no arcs coming out of the node. Figure 1 shows that node 2 has no outgoing nodes; in Figure 2, this is reflected by node 2's lack of a pointer to an arc. In the single linked list of arcs, this is represented by node 2's firstArcOut being equal to the firstArcOut of node 3. Therefore, if nit is a node_iterator, the list of arcs originating in the node pointed to by nit is given by the range

[ (*nit._Myit()).firstArcOut),
(*(++nit)._Myit()).firstArcOut )

where the upper bound is not included (_Myit() returns the container iterator). Thus, if the lower bound equals the upper bound, the set is empty and there are no outgoing arcs.

The big problem comes at the end of the node list, where ++nit is no longer a valid node_iterator. This is a classic boundary-condition scenario, which I chose to handle with program logic. An improved technique may be to add a dummy node at the end of the list, which then requires adding boundary-condition logic to functions that add and remove and iterate over nodes (perhaps a better choice, too, given that most algorithms spend far more time iterating over arcs than nodes).

Inserting and Removing

The only challenge in inserting and removing nodes and arcs is to keep all the relationships straight. When a node is inserted into the network, through a call to insert_nodes, it may cause the index of some nodes to change. Because the arcs maintain their incident nodes by index, the insertion algorithm must iterate over and update all of the arcs. This iteration occurs in the _patch_arcs_insert function (Listing 2) .

Removing nodes is slightly more difficult, because any arc incident to a node being removed must also be removed. Because erasing nodes may also cause node indices to shift, the arcs must be checked for these shifts as well. So, the erase_nodes routines iterate over all arcs, deletes those that are incident to a node being removed, and updates those whose incident nodes have shifted. It then removes the nodes.

There are two ways to insert an arc, both through overloaded versions of member function insert_arcs. You can specify both the source node and destination node (as either iterators or indices) for the new arc, or you can specify a destination node and another arc with the same source node for the new arc. The trick to inserting arcs is to patch all of the firstArcOut iterators for nodes preceding the new arc's source node. For example, in a five-node network with no arcs, all firstArcOut values point to the end of the arc list. If you insert an arc from node 2 to 3, then the firstArcOuts for nodes 0 and 1 must point to that arc, while those for nodes 3 and 4 still point to the end of the arc list. Removing arcs is much the same. If you remove an arc that happens to be a firstArcOut for some node, that node and nodes preceding it may need updating. This behavior is captured in the erase_arcs member function.

Other network functions

Many of the remaining functions simply expose features of the contained member variables. These functions include clear, clear_arcs, back_nodes, capacity_arcs, capacity_nodes, empty, empty_arcs, front_nodes, max_size_nodes, pop_back_nodes, push_back_nodes, reserve_nodes, resize_nodes, size_arcs, and size_nodes. Simply remove the suffix and the function has the same behavior as the corresponding vector or list function.

The function node returns a reference to the indexed node, and the functions arcDestinNode and arcSourceNode return the index of an arc's destination and source node, respectively.

Example: A Shortest-Path Algorithm

Now it's time to make something happen with our network. Listing 3 shows one of two shortest-path algorithms. These functions have the same parameter list, so by just changing the name of the function you can get different algorithmic behavior.

The first routine solves the shortest path using the heap implementation of Dijkstra's algorithm, and the second uses a FIFO label-correcting algorithm [1].

These algorithms solve the "all-shortest paths problem," which is to find the shortest path from one node to all other nodes. Each function takes five parameters. The first is a reference to the network. The second is the source node from which to find all the paths. The third is a reference to a vector containing type _NET::size_type elements. Upon successful completion, the ith element of this vector is the node that precedes node i in the shortest path from the source node to node i. By tracing the list of predecessors you can enumerate the shortest path backwards to any node from the source. The fourth parameter is a reference to a vector that will contain the shortest distance from the source node to ith node, for vector element i. The fifth element is similar to a predicate, in that it provides the metric functionality for distances, described shortly. These functions return true on success; the latter function can detect negative cycles in the network (leading to minus-infinity shortest paths) and will return false if it finds one.

The shortest-path problems are complex; they must add distances as well as compare them. Because of this complexity, the predicate class is correspondingly complex. For shortest-path problems, the predicate must contain a specific value_type (such as int or double), static values for infinity and zero, and an operator() that, given an _ARC, returns the value_type representing the distance for that arc. Here, for example, are a network and corresponding predicate declaration:

typedef network<int, int, allocator<int> > NETWORK;

struct ARCCOST
{
    typedef int value_type;
    static const int infinity;
    static const int zero;
    int operator()(const int& arc) const
        { return arc; }
};

In this network the nodes and arcs each contain one int. The distance of an arc is represented by the int. The corresponding parameters to the shortest-path algorithms would be, in order NETWORK&, NETWORK::size_typed>, vector<NETWORK::size_type, allocator<NETWORK::size_type> >&, vector<ARCCOST, allocator<ARCCOST> >&, and ARCCOST&. The value_type of the predicate must support addition, assignment, and comparisons.

Using an abstracted distance measure has the benefit that you can use doubles or ints or any other type as appropriate. Using strings as distances, you could calculate a shortest-path over a network of letters. Sample routines that exercise the network and the shortest-path algorithms are shown in Listing 4.

Conclusion

I have presented but one way of many to implement a network. A more space-conscious implementation would forego some member variables and embed more logic. A more feature-rich implementation would add the ability to iterate over arcs sharing the same destination node. A network optimized for speed might use a separate list of arcs at each node, making node and arc insertions constant-time operations. This implementation, though, is sufficient to manage many common numerical network problems. o

Note

[1] My implementations of these algorithms and the sample networks in Listing 4 are derived from Network Flows, by Ahuja, Magananti, and Orlin, Prentice-Hall, Inc., 1993, ISBN 0-13-617549-X.

John Ruark is about to finish his Ph.D. in Operations Research at MIT. His thesis develops an architecture for the implementation of algorithms in integrated modeling environments using component object systems. This is his first published article. He can be reached at jruark@worldnet.att.net.