December 1999/Manipulating Sparse Matrices

Features

Manipulating Sparse Matrices

Mark C. Peterson

Many matrix computations involve far more zeros than nonzero element values. A representation that squeezes out zeros can really speed up several critical operations.

A very good technique for efficiently handling matrices is to take advantage of the fact that most of the coefficients in real-world matrices are zeros. Matrices that are mostly zeros are called sparse matrices. Their counterparts, which contain very few zeros, are called dense matrices. This article focuses on sparse matrices. I present a stable, easy-to-use matrix template class upon which you can build complex algorithms that compile to efficient object code. I also show how to use this template class to solve linear systems.

The storing, adding, and multiplying of large quantities of zeros can be a horrendous waste of computing power. I use template class smatrix to specify base classes that efficiently handle these types of matrices. (A partial listing of smatrix is shown in Figure 1. smatrix is much too large to show in its entirety here. The full source code is available on the CUJ ftp site. See p. 3 for downloading instructions.) This template leverages the template classes found in the Standard C++ headers <map> and <vector> to produce highly efficient code that is orders of magnitude faster than dense-matrix algorithms. For example, using smatrix<t> I am able to solve a 2,000 x 2,000 matrix with 15,000 nonzero entries in nine seconds on a 330 MHz Pentium class machine.

Storage Format

I use a vector<t> member e in smatrix<t> to store the nonzero matrix coefficients. Each row in smatrix<t> has a corresponding object of class map<size_t, size_t> that maps the column index for that row to the coefficient position in e. Likewise, each column also has a similar map object that maps the row indices for that column. The iMap and jMap members in smatrix<t> are vectors that contain the pointers to these maps. These mapping vectors are base one rather than base zero to match the conventional matrix index notation. The extra maps at iMap[0] and jMap[0] are used to reference zero coefficients.

Figure 2 shows the output from the function smatrix<t>::PrintListing, defined in the header smatrix.h, for a random 5x5 matrix. (The file smatrix.h is too large to list in this article, but is available on the CUJ ftp site.) To look up the coefficient for a matrix element in row 2, column 5, you can use the map pointed to by iMap[2], which contains a list of all the entries for row 2, sorted by column index. A search of this map for column index 5 would retrieve offset 3. I could then retrieve the coefficient at e[3]. Alternatively, you can use jMap in a similar manner. Here you would look up row 2 in jMap[5] to retrieve offset 3.

Each mapping contains a well ordered set of indices that you can use efficiently in matrix algorithms. For example, suppose the iMap and jMap elements were for the multiplicator and multiplicand arguments respectively for a matrix multiplication algorithm. A typical multiplication algorithm would calculate each (i, j) element in the product matrix as the dot product of row i in the multiplicator with column j in the multiplicand. This would require ten floating-point operations (five multiplications and five additions to an accumulator) to calculate the element (2, 1) in a fully populated matrix. For a sparse matrix, I instead sum only the products of matching entries in row 2 and column 1:
 row(2) (1,1)     (3,2)     (5, 3)
*col(1)      (2,1)     (4,5)(5, 8)
—————————————————————————————————————
elem(2,1) =                 e[3]*e[8]
Here I need to perform only two floating-point operation and five integer comparisons.

What if these two rows and columns were in a 2,000x2,000 matrix instead of a 5x5 matrix? Using dense-matrix methods would require 2,000 multiplications and 2,000 additions. You could modify the dense matrix code to avoid the multiplications and additions by explicitly checking for zeros, but this still involves 2,000 floating-point comparisons for just this one element. Using sparse-matrix techniques, you would need only two floating-point operations and five integer comparisons to calculate this element regardless of the overall matrix dimensions.

Note that while I designed the smatrix template class to efficiently handle sparse matrices, it also works perfectly well with dense matrices. There is some additional overhead associated with the map storage class that you would not have in algorithms designed expressly for dense matrices, but this additional overhead is trivial compared to the computing complexity associated with the dense matrix itself.

Populating Matrix Elements

smatrix has four constructors:
smatrix();
smatrix(const smatrix<t>& a);
smatrix(const t* d, size_t iDim,
   size_t jDim);
smatrix(size_t iDim, size_t jDim,
   size_t MaxNZ);
The first constructor instantiates an empty matrix, the second is a copy constructor, and the third initializes a matrix from a given array of fully populated elements. The last constructor instantiates a matrix with pre-allocated storage, where MaxNZ is the maximum number of expected nonzero entries.

The first three functions in the test program main.cpp, Listing 1, show how to use these constructors to create the matrix shown in Figure 1. Function CoefInit5x5 constructs an empty matrix c and explicitly assigns the coefficients. While this is notationally convenient, it is rather inefficient. The code in smatrix will have to dynamically resize the iMap, jMap, and vector<t> members as their capacities are exceeded.

Function ArrayInit5x5 initializes the matrix by passing it a fully populated array representation of the matrix. Function InsertInit5x5 shows the most efficient initialization method. It first constructs c by declaring the matrix dimensions and number of nonzero elements. It then inserts each nonzero coefficient.

Manipulating Rows and Columns

Three functions in main.cpp, Listing 1, demonstrate how to create algorithms that manipulate the matrix row and column vectors. The vectors described in this context are the mathematical ones rather than objects of class vector<t> from the Standard C++ library.

Template function MyMultiply is a simplified version of the smatrx<t>::operator* for multiplying two matrices. It forms the product matrix element-by-element by calculating the dot product of the row vectors in matrix a and the column vectors in matrix b. Likewise template function MyAdd is a simplified version of smatrix<t>::operator+. Both functions use the row and col member functions to access specific rows and columns.

The third template function MyUpperTrans transposes the upper triangular portion of one matrix into the lower triangle of a new matrix. This function references portions of row and column vectors rather than the entire vector by using rng to specify a range restriction.

I use three classes of mathematical vectors within smatrix to efficiently enable the manipulations shown in Listing 2:
const_svector_ref
svector_ref
svector
These classes return sparse vectors. The reference versions are related to the matrix that originated them. For example, the call row(2) on a const matrix will return a const_svector_ref that contains a pointer to the original matrix, a flag indicating that it is a row-oriented sparse vector, and the row index value of 2. The non-reference instances of svector are temporaries resulting from manipulating the references.

The expression row(2) - row(3) generates a temporary svector that contains the difference between rows 2 and 3. The non-constant sparse vector reference instances, encapsulated by the svector_ref class, are intended for assignment within the non-constant matrix. The row(1) portion of the expression row(1) = row(2) - row(3) returns an svector_ref instance that is assigned the temporary svector generated by row(2) - row(3).

Matrix Transposition and Pivots

Independent storage of the i and j indices also supports extremely efficient matrix transpositions. The function smatrix<t>::Transpose executes a transposition by simply interchanging the two pointers to the iMap and jMap vectors and interchanging iDim and jDim, which requires about six assignments regardless of the size of the matrix.

Interchanging individual rows and columns in a matrix is equivalent to reordering the original equations and variables respectively. This is known as pivoting. The smatrix template provides two functions to support explicit pivoting:
void PivotRow(size_t a, size_t b,
   smatrix<t> *pMatrix = NULL);
void PivotCol(size_t a, size_t b,
   smatrix<t> *pMatrix = NULL);
These functions interchange row or column a with row or column b and execute a corresponding pivot in the smatrix<t> pointed to by pMatrix if it is not null.

Two similar functions also execute pivots, but these functions look for the coefficient with largest magnitude in the rows or columns with indices greater than k:
bool PivotRow(size_t k, t& MaxCoef,
   smatrix<t> *pMatrix = NULL);
bool PivotCol(size_t k, t& MaxCoef,
   smatrix<t> *pMatrix = NULL);
You generally use these functions to perform a partial pivot in a forward-elimination algorithm. Each returns a value of false if it is unable to locate a coefficient with an absolute magnitude greater than the matrix's current round-off error estimate, epsilon. Each also sets the MaxCoef reference to the coefficient of the located row or column.

I also include in the smatrix template a GetDiag function with the following prototype:
bool GetDiag(size_t k, t& Diag,
   bool bWithPartialPivot = false,
   smatrix<t> *pMatrix = NULL);
This functions sets the Diag reference to the diagonal coefficient at (k, k). It uses the PivotRow function if the bWithPartialPivot flag is set to true, which will pivot any row greater than k with the current row if it finds a coefficient with a greater absolute value. It will also perform a corresponding pivot on any matrix pointed to by pMatrix.

Random and Identity Matrices

Template classes TRndSMatrix and TRndDMatrix in RndSMatrix.h (not shown), create random sparse and dense matrices, respectively. Here are two examples from main.cpp, Listing 1:
a = TRndDMatrix<double, double>
    (4, 5, -6, 7);
b = TRndSMatrix<double, int>
    (8, 9, -11, 12, 50);
Matrix a is assigned a densely populated 4x5 smatrix<double> with double-precision coefficients ranged between -6 and 7. Matrix b is assigned a sparsely populated 8x9 smatrix<double> with 50 nonzero integer coefficients ranged between -11 and 12.

Most purely random sparse matrices are not solvable. The TRndSolvable template class creates a random sparse matrix that is very likely solvable. It does this by creating a random matrix with nonzero coefficients along the diagonal. It then expands the number of coefficients along each side of the diagonal until it reaches the desired number of nonzero coefficients. This creates what is known as a banded matrix. After creating the banded matrix, it shuffles the rows and columns using the PivotRow and PivotCol functions.

The smIdentity<t> class creates an smatrix<t> identity matrix for a given dimension. You specify the dimension of the matrix in the constructor.

Inversion and Equation Solving

There are many different ways to solve a set of linear equations, but they all fall into one of two categories — direct or iterative. Iterative methods involve an initial estimate of the correct answer. You run this estimate through a formula to arrive at a new estimate that is hopefully closer to the correct solution. You continue iterating these estimates through your formula until you are satisfied that the estimate is close enough. If the iterations diverge, then the matrix does not have a solution using that technique.

The JonesMayerInv template function in main.cpp, Listing 1, implements an iterative method for calculating the inverse of a matrix suggested by Jones and Mayer in 1995 and described in the book Topics in Advanced Scientific Computation, by Richard E. Crandall [1]. Jones and Mayer discovered how to adapt Newton's method for calculating reciprocals to calculating matrix inversions.

Direct methods use variations on Gaussian elimination to solve a set of linear equations and are generally much faster than iterative methods. Of the direct methods, LU decomposition is considered the fastest method for solving linear systems. The TLUDecomp template in LUDecomp.h, Listing 2, implements this algorithm as a subclass of smatrix<t>. The TestLUDecomp function in main.cpp shows how to use the TLUDecomp<t> constructor and the TLUDecomp<t>::Solve function to solve a linear system. Note that the TLUDecomp<t> constructor results in an empty matrix if the system is not solvable. You can use smatrix<t>::operator!() to check for an empty matrix.

Note that if the LU decomposition fails, the system may still be solvable using an iterative method. This is because the LU decomposition may have failed due to round-off error. There are iterative methods available that compensate for the round-off error inherent in the direct methods.

The TLUDecomp template is actually composed of three matrices: the lower triangular factor L, the upper triangular factor U, and a matrix of recorded pivots. The two triangular matrices L and U are combined as a single matrix in the smatrix<t> base class. You can extract these imbedded matrices using the L and U functions. The pivots matrix is initially an identity matrix with the same dimension as the matrix to be decomposed. As the LU decomposition algorithm performs a partial pivot during the forward elimination phase, it performs corresponding pivots on the pivots matrix such that the pivot matrix represents the pivoted identity. The P function will return a copy of the pivots matrix.

The TLUDecomp<t>::Solve function does not restrict you to solving for single vectors, i.e. matrices with only one column. You can pass the TLUDecomp<t>::Solve function an smatrix<t> with multiple columns, where each column in the matrix is a vector you would like solved. I use a similar technique in the TLUDecomp<t>::Inverse function, where I arrive at the inverse by performing a solution on the identity matrix. The TestLUInverse in main.cpp shows a test of this algorithm.

Row and Column Iterators

The code in LUDecomp.h, Listing 2, also demonstrates how to use the smatrix<t> row and column iterators. These iterators generally reference only nonzero coefficents. Iterating in this manner is key to the efficiency of smatrix<t>-based algorithms. Note that there could be zero or near-zero coefficients within the matrix as a result of a previous addition or subtraction.

There are two forms of the IterForRow and IterForCol functions. The IterForCol(k) form used in the BackElim function returns an iterator for the first nonzero coefficent in column k. The IterForCol(k, k + 1) form used in the FwdElim function returns an iterator for the first nonzero coefficent in column k starting at or after row k + 1. Both forms support the return of constant and non-constant iterators.

I also provided Index and Coef functions that return the index and coefficient values respectively associated with the current location of the iterator. The EndOfRow and EndOfCol functions test the iterator to determine if it has been incremented past the last nonzero coefficient.

Calculating the Determinant

The smatrix template provides two functions for calculating the determinant — smatrix<t>::det and smatrix<t>::lnScaledDet. The det function performs a forward elimination and returns the product of the resulting diagonals times -1 if it performed an odd number of pivots.

Large matrices that are even slightly scaled can have very large determinants that can easily overflow even for double-precision numbers. The lnScaledDet function operates in a similar fashion to the det function except that it sums the natural log of the absolute value of the diagonals while keeping track of the sign of the underlying determinant number. The TestDeterminant function in main.cpp shows how use the lnScaledDet function to remove the scaling factor from a matrix to create an unscaled matrix where the determinant of the unscaled matrix is +/-1.0.

Matrix Epsilon

Round-off errors are the bane of matrix computations. Small errors in poorly conditioned matrices can lead you down the garden path into believing you have arrived at a valid solution, when it is in fact not even close.

The Standard C library defines a macro related to double-precision floating point round-off error in the header <float.h>. The macro DBL_EPSILON is the smallest value such that 1.0 + DBL_EPSILON is guaranteed not to be equal to 1.0. The smatrix template maintains a member epsilon for each matrix. For the smatrix<double> classes the value for epsilon is DBL_EPSILON times the absolute value of the maximum matrix coefficient. Coefficients less than epsilon are treated as zero. Thus, division during forward elimination by a diagonal coefficient less than the current epsilon value for the matrix indicates that the matrix is not solvable by that technique.

The Solvable and Unsolvable functions in main.cpp show the effects of round-off errors in ill-conditioned matrices and demonstrates that the LUDecomp template constructor results in an empty matrix when they are too severe.

Optimizations and Temporaries

You will notice that the non-optimzied debug compilations of smatrix<t> code will run many times slower than the optimized version. This is due to the extensive use of inline functions and returning of temporaries. Consider this template function that returns a temporary:
template <class t>
smatrix<t> ReturnTemporary(
const smatrix<t> &m)
{
   smatrix<t> c;
   // some stuff that manipulates 'c'
   return(c);
}

cout << ReturnTemporary(m) << endl;
A literal compilation constructs and manipulates c within the function and calls the copy constructor to return the result to the output stream. An optimized compilation should compile the template code with the unnecessary temporaries removed. That version constructs c once, manipulates its contents, and passes the reference to c to the output stream.

Be forewarned that while Microsoft VC++ v6.0 will remove unnecessary temporaries, other compilers may not. For the optimizers in other compilers, you should either check the compiler documentation or examine the compiled assembler code listing.

FastHeap Dynamic Allocation

One of the drawbacks to using a Standard C++ library map template is that it allocates and frees tree nodes. The C++ heap manager for most compilers is designed to continuously defragment the heap by merging freed blocks with adjacent blocks that are also free. This entails some significant overhead when used frequently as described in an article published last year by Vladimir Batov. (See "A Quick and Simple Memory Allocator" in the January 1998 issue of CUJ [2]). Unfortunately, Vladimir's code will not work on the map template tree nodes. The map template allocates tree nodes out of the global heap rather than the allocator<t> in the template declaration.

Instead, I use a simliar technique which intercepts global heap usage by overriding ::operator new and ::operator delete. This code is not shown here, but is also available with the source code archive accompanying this article. After you call a function named EnableFastHeap, it begins caching small freed blocks (less than 256 bytes) rather than returning them to the global heap. This allows the memory to remain fragmented until there is a subsequent call to a DisableFastHeap function, which returns the cached blocks to the global heap. This technique roughly doubles the execution speed.

Information Sources

[1] Richard E. Crandall. Topics in Advanced Scientific Computation (Springer-Verlag, 1996). ISBN 0387944737.

[2] Vladimir Batov. "A Quick and Simple Memory Allocator," C/C++ Users Journal, January 1998.

[3] The University of Notre Dame publishes a sparse-matrix library at http://www.lsc.nd.edu/research/mtl/ that you may wish to investigate. It does, however, require a good deal of knowledge regarding its internal functions and formats.

[4] The National Institute of Standards and Technology has a Matrix Market web site at http://math.nist.gov/MatrixMarket/ that contains a set of matrices you can use to test matrix software. The root page at http://math.nist.gov also has links and a search engine to other information sources.

[5] David S. Watkins. Fundamentals of Matrix Computations (John Wiley & Sons, 1991). ISBN 0471614149. This is an excellent book for anyone who would like to understand the theory of matrix computations.

[6] Gene H. Golub and Charles F. Van Loan. Matrix Computations — John Hopkins Series in Mathematical Sciences (John Hopkins University Press, 1996). ISBN 0801854148, paperback; 080185413X, hardcover. This book provides much more depth than Watkins'

[7] Yousef Saad. Iterative Methods for Spares Linear Systems — The Pws Series in Computer Science (Pws Publishing Company, 1996). ISBN 053494776X. This book covers iterative techniques for dealing with sparse matrices that do not solve well using direct methods discussed in this article.

Mark C. Peterson is a Business Applications Systems Developer for Northeast Utilities. He has published several books, including the Borland C++ Developer's Bible. He is currently pursuing a Masters in CS at RPI. You can email him at petermc1@home.com.