Tom is president of Rogue Wave Software. He can be contacted at P.0. Box 2328, Corvallis, OR 97339.
Fortran has long been the lingua franca of the numerics world. Yet Fortran's shortcomings have become a tired joke amongst programmers. Its limited type checking, lack of extensibility, and reliance on global data make it extremely hard to maintain and debug. So why does it survive?
The reasons are simple: momentum and efficiency. Countless complicated algorithms have been implemented in Fortran, and no one wants to rewrite them. And, for all its warts, Fortran is still an extraordinarily efficient language.
It's not enough to offer a language that's just as good as Fortran. If people are to switch, the replacement language must not only be the equal of Fortran in terms of efficiency and code reuse, but it must also be a lot better in terms of productivity, maintenance, and power. A tall order!
C++ meets these criteria, making it (in my mind) the first serious contender to challenge Fortran's supremacy. With care, C++ can meet Fortran's efficiency (although not exceed it, except in the sense that it can make previously too complicated algorithms feasible). It is also easy to call existing, known-to-be-correct Fortran code from C++ (at least under UNIX and some dialects on the PC). The kicker is that with C++, it can be much easier to build and maintain the really big, gnarly code being contemplated today. Here's why:
ComplexVec b; // Declare a complex vector
cin >> b; // Read it in
FFTServer s; // Allocate an FFT server
// Calculate the transform of b:
ComplexVec theTransform = s.fourier(b);
cout << theTransform; // Print it out
Fortran 90 offers substantial improvements, giving the language a level of portability and maintainability roughly equivalent to C. Among the improvements the X3J3 standards committee included were recursion, dynamic memory allocation, and pointers. In other areas the committee went further, adding built-in support for vector arithmetic and structures (TYPE, in Fortran vernacular) and eliminating Do loops in many situations (although the syntax is not pleasing).
Still, many other modern concepts are missing. Fortran is far from supporting key object-oriented concepts such as inheritance and encapsulation, let alone polymorphism. Useful C++ concepts like constructors and destructors and parameterized types are also missing. It is these features, plus a host of other, smaller features (const variables come to mind) that give C++ its unique blend of efficiency and maintainability.
C++'s ability to define whole new types and an accompanying algebra gives the language a chameleon-like quality. A class designer, for example, can make the language look remarkably different, depending on what goals need to be achieved (now it's a database language, now it's a graphical language...). Several new types are particularly well suited for numerics, including vectors, matrices, linear algebra, and transforms.
Vectors. Encapsulating arrays inside a vector class to give them a natural, predictable arithmetic is an obvious place to start. How might such a class look? This depends on how the vectors are constructed, what their arithmetic looks like, and so on. For concreteness, let's look at how a vector of doubles (call it class DoubleVec) might look, starting with a set of constructors that allows us to create new vectors in a predictable way; see Example 2(a). You can then add some basic arithmetic and assignment operations as in Example 2(b), and address individual elements of the vector as in Example 2(c). If, however, you want to set every other element of a vector to some value, you have to write something like Example 2(d), which defeats the advantages of abstraction.
(a)
DoubleVec a; // Null vector - no elements at all, but
can be resized
DoubleVec b(8); // 8 elements long, uninitialized
DoubleVec c(8,1); // 8 elements, initialized to 1.0
DoubleVec d(8,1,2); // 8 elements, initialized to 1, 3, 5, ...
(b)
b = 2.0; // Set all elements of b to 2
b = c + d; // Set b to the sum of c and d
b *= 2; // Multiply each element in b by 2.
(c)
b[2] = 4.0; // Set the 3'rd element of b to 4.0.
c[1] = b[3]; // Set the 2'nd element of c to the 4'th element of b
(d)
DoubleVec a(10, 0); // 10 element vector
for (int i = 0; i<10; i+=2)
a[i] = 1;
Having taken the trouble to encapsulate the array elements into a vector, alarm bells should go off in your head if you start to take the vector back apart and address individual elements. The results are likely to be slow and hard for the compiler to optimize. Instead, you need to tell the program, in some abstract sense, to "set every other element to 1."
The key to maintaining a high level of abstraction is the slice that allows elements separated by a constant stride (say every second element) to be addressed; see Example 3(a). The slice is an extremely powerful abstraction that can be used to implement a variety of algorithms. As an added bonus, the BLA routines have been programmed in terms of slices, so we can take advantage of existing, highly optimized versions of this package to implement our slice arithmetic.
(a)
a.slice(0, 2) = 1; // Starting with element 0; set every other
element to 1
(b)
class DoubleVec {
...
public:
...
operator DoubleSlice(); // For type conversion
DoubleSlice slice(int start, int step, unsigned N) const;
};
// The "helper class";
class DoubleSlice {
DoubleVec* theVector;
int startElement;
unsigned sliceLength;
int step;
public:
...
friend DoubleVec operator+(const DoubleSlice&, const DoubleSlice&);
...
};
(c)
DoubleVec b(10, 0), c(10, 1);
DoubleVec d = b.slice (0, 2) + c.slice(1, 2);
(d)
DoubleVec g = b + c; // DoubleVec to DoubleSlice type conversion occurs
There are two architectural approaches to implementing slices: either using a "helper class" or building the slices into the vector class. Each has advantages, although the role of slices in algorithms is so fundamental that the second approach tends to produce cleaner, more efficient code. Nevertheless, it is useful to take a look at the first approach.
What's a helper class? To answer this, look at Example 3(b), where the actual vector data has not been shown in the interest of clarity. In addition to the basic vector class DoubleVec, there's a second ("helper") class named DoubleSlice, which contains the data necessary to address the "sliced" elements. The member function slice() returns an instance of this class. All the arithmetic operators must be implemented using DoubleSlices as an argument (as per the + operator in the example code) to allow expressions like Example 3(c). This leads to slow code, because a simple vector must continuously undergo type conversion to the helper class; see Example 3(d).
Alternatively, you could implement two versions of the arithmetic operators: one taking the vector as an argument, the other the helper class. This, however, leads to type-conversion ambiguities. Building slices into the vector class leads to simpler code, so type conversion becomes simpler as well.
The lesson here is that helper classes are fine, but when the code fundamentally depends on them, the results are slow and type conversions are always ambiguous. You should reexamine your approach to see if your problem depends on what helper classes are trying to accomplish in some fundamental way. If so, you might be better off finding a way to eliminate them, even if that makes the remaining classes slightly more complex.
Listing One, page 47, illustrates how a vector class with built-in slices might be implemented. (Error checking, efficiency optimization, and code that deals with special cases have been omitted in the interest of clarity.) First comes the vector data, which can be shared by more than one type of vector. It contains a reference count and a pointer to an array of raw, untyped data. The constructor specifies the number of elements in the vector and the size (in bytes) of each individual element.
Listing Two, page 47, outlines the actual vector that includes a pointer to the DataBlock (outlined in Listing One). The reference count in class DataBlock is used to ensure that the vector data is not prematurely deleted if more than one vector is using it. For every vector actively using the DataBlock, the count is increased by 1. When a vector is done, the count is decremented by 1. When the count is 0, no vector is using the block, and it can safely be deleted.
There's also begin, a pointer to the start of data (and the slice), where the actual data typing occurs. This design approach allows multiple types to share the same DataBlock and eliminates one level of indirection at the expense of some extra storage space. As you'll see, it also allows some highly expressive statements.
There's one other wrinkle: the variable step. This is the stride length, the step size between contiguous elements of the vector. A conventional vector is a slice that starts with the 0th element and has a stride of 1.
A slice of an existing vector is created by calling the member function slice(), which in turn, uses a special constructor; see Listing Three (page 47).
The combination of a generalized starting element and stride enables some very powerful and intuitive expressions. For example, it's trivial to return all the real parts of a complex vector as a DoubleVec: It's just a slice of every other element in the complex vector. (Perhaps this isn't the safest approach since it assumes a known structure for type complex. It would fail if someone implemented complex using, say, polar notation. Nevertheless, the functionality could be retained by using a helper class.) The result is that the function in Example 4(a) can be used as an lvalue, as in Example 4(b).
(a) DoubleVec real(const ComplexVec&); (b) ComplexVec a(10, 10, 0); // (0,0), (0,0), (0,0), ... real(a) = 1.0; // (0,0), (0,0), (0,0), ...
Matrices. Matrices are an extremely important part of numerics that can be created by inheriting from a vector; see Listing Four (page 47). Note the member functions col(unsigned) and row(unsigned) that return a column or row, respectively, as a vector slice, allowing expressions like Example 5(a). It's even possible to return the diagonal as a slice; see Example 5(b).
(a) DoubleMatrix a(10, 0); // 10 by 10 matrix, initialized to zero a.row(3) = 1; // Set row 3 to 1 a.col(2) = a.col(4); // Copy column 4 to column 2 (b) DoubleMatrix I(10, 10, 0); // 10x10 initialized to 0 I.diagonal() = 1; // Create an identity matrix
Linear Algebra. Matrix decompositions--LU Decomposition and Singular Value Decomposition (SVD)--occupy a central role in linear algebra. However, pivot indexes, conditioning numbers, nullspace, and range vectors abound. Gathering these into a C++ class makes them much easier to work with. Here's how to do this with LU decompositions.
The LU decomposition of a matrix consists of finding two matrices such that A = LU, where L is a lower-triangular matrix, and U is an upper-triangular matrix. This decomposition can be used to solve sets of linear equations. Listing Five (page 47) illustrates how an LU decomposition class might be structured.
Note the constructor: An LU decomposition is "constructed" from a matrix. For computational reasons, what is actually calculated is the LU decomposition of a row-wise permutation of the original matrix. The vector of ints permute is used to keep track of the original index of each row. The rest of the construction process consists of calculating the lower- and upper-diagonal matrices L and U, which are then packed into a private matrix base class of the same dimension as the original matrix.
Several of the more ugly details of LU decomposition can be hidden by encapsulation. For example, it is of no interest to the user that the L and U matrices are stuffed inside a single matrix; hence the private declaration of the base class. Nor is it the user's concern that a row-wise permutation of the original matrix is being decomposed. Example 6(a) shows how to use such a decomposition class.
(a)
DoubleMatrix a(10, 10);
// ... (initialize a somehow)
// Construct the LU decomposition of a:
LUDecomp aLU(a);
// Now use it:
double det = determinant (aLU);
DoubleMatrix aInverse = inverse (aLU);
(b)
// 5 different sets of linear equations to be solved:
DoubleVec b[5], x[5];
// ... (set up the 5 vectors b and the 5 vectors x, each
// with 10 elements as per the matrix a above)
for (int i = 0; i < 5; i++)
x[i] = solve (aLU, b[i]);
(c)
DoubleMatrix a(10, 10);
// ... (initialize a)
// Calculate the inverse directly from a.
// A DoubleMatrix to LUDecomp type conversion takes place automatically:
DoubleMatrix aInverse = inverse (a);
(d)
DoubleMatrix aInverse = inverse (LUDecomp (a));
You can also use the LU decomposition to solve a set of linear equations ax=b, using the friend function solve(); see Example 6(b).
For the user, who doesn't even want to worry about LU decompositions, type conversion can play an attractive and convenient role. In Example 6(a), the LU decomposition was created first, then used to calculate, say, the inverse of the matrix. However, you could just as well request the inverse of the original matrix and let type conversion do the work, as in Example 6(c).
Seeing no prototype inverse(const DoubleMatrix&), the compiler will look for a way to convert DoubleMatrix a into something for which it has a prototype. When it discovers the constructor LUDecomp(const DoubleMatrix&), the compiler will invoke it to call inverse (const LUDecomp&).
There are, of course, limitations to this approach: If more than one decomposition is possible (SVD, for example), the user must specify the conversion explicitly, lest the compiler issue an error about ambiguous type conversions; see Example 6(d).
Transforms (FFTs and All That Stuff). Any algorithm that requires expensive precalculation before use is a good candidate to become a class. The fast Fourier transform (FFT) is one such algorithm. To transform a vector of length N, the Mth order complex roots of 1 must be calculated, where M is the set of prime factors of N. For example, if N=30, then the 2nd, 3rd, and 5th order complex roots (2x3x5=30) of 1 must be calculated. This is an expensive calculation. If you are going to transform many vectors of that length, you don't want to throw the results away. The solution is to design a server class like that in Listing Six (page 47) to hold these roots: At any given moment, the server class can be configured to transform a vector of a certain length.
The "roots of one" of all the prime factors of a vector of length npts are packed into the complex vector theRoots. They are calculated at three possible times: when the server is constructed, when the user calls setOrder(unsigned), or dynamically, when the server transforms a vector. Because of this last capability, using such a server is a pleasure because you don't have to worry about whether it is configured correctly to transform a given vector. If it's not, it will automatically reconfigure, as in Example 7.
ComplexVec timeVector(30); FFTServer aServer; // Allocate a server // Will automagically reconfigure for a vector of length 30: ComplexVec spectrum = aServer.fourier (timeVector);
Of course, each reconfiguration is expensive, so if you plan to transform a bunch of vectors of varying length, you will probably want to keep many servers on hand. The bookkeeping to do this is far easier with self-contained servers than with the equivalent Fortran approach. (You could even use a hashed table lookup to find the correct server, making a super-server!)
If the price were reduced efficiency, all of C++'s features would be of little interest to Fortran programmers. Consequently, we'll look at two benchmarks that highlight C++'s efficiency.
Nearly all numerical algorithms come down to performing binary operations on large numbers of elements--that's why pipelined architectures on vectorizing machines such as the Crays have been so effective and popular. Hence, it is important to get this right.
Figure 1 compares the time and code required for C++ (using Rogue Wave's Math.h++) versus Fortran to multiply together two vectors as a function of the vector length. The figure demonstrates that, with care, C++ can be even faster than Fortran!
Why? Because with C++ it's easy to isolate the crucial piece of code and treat it right. In this program, the critical expression is DoubleVec c=a*b;. To evaluate this, the compiler will call the function with prototype DoubleVec operator*(const DoubleVec&, const DoubleVec&);. The sole job of this function is to multiply the two operands together and return the results. It's important to note that the context of this multiply is completely controlled, freeing us from traditional C problems such as "aliased pointers": All indexes and intermediate results can be held in registers. Inside the function you can have highly optimized assembly code (as in this benchmark) or a call to a set of specialized BLAs (if available). The result is extraordinarily fast code.
Figure 2 shows a different measure of efficiency, the venerable Linpack benchmark. This benchmark sets up a matrix, factors it (using LU factorization), then solves a set of linear equations using that factorization. Only the factorization and solution time is actually used in the benchmark. (The matrix setup time is not measured.) Figure 2(a) shows a C++ version (using Math.h++), and Figure 2(b) shows a standard Fortran version. The code listings have been set up such that comparable statements line up side-by-side.
(a) #include <dgenfct.h> (b) double precision a(90, 90)
b (90),x(90)
#include <rstream.h> double precision ops,
mflops, norma, normx
double precision resid,
residn, eps
double precision t1, t2,
total
integer ipvt (90)
class DTestMatrix : public
DoubleGenMat {
public:
DTestMatrix (unsigned
order);
};
class DTestRHS : public
DoubleVec {
public:
DTestRHS (const
DTestMatrix&);
};
double second(); double precision second
double epslon (double); double precision epslon
lda = 90
const unsigned N = 90; n = 90
const unsigned long ops
= 2.0*N*N*N/3.0 + 2.0*N*N; ops = (2.0e0*n**3)/3.0e0
+ 2.0e0*n**2
void main() {
DTestMatrix a(N); call matgen (a,lda,n,b,norma)
double norma = maxVal
(abs(a));
DTestRHS b(a);
double t1 = second(); t1 = second()
// Construct the LU
Factorization:
DoubleGenFact fact (a); call dgefa (a,lda,n,ipvt,info)
t1 = second() - t1; t1 = second() -t1
double t2 = second(); t2 = second()
DoublVec x = solve
(fact, b); call dgesl (a,lda,n,ipvt,b,0)
t2 = second() - t2; t2 = second() - t2
double total = t1+t2; total = t1+t2
double mflops = ops
/ (1.0e6*total); mflops = ops/(1.0e6*total)
DoubleVec tol =
a.product(x) - b; do 10 i = 1,n
x(i) = b(i)
10 continue
call matgen (a,lda,n,b,norma)
do 20 i = 1,n
b(i) = -b(i)
20 continue
CALL DMXPY (n,b,n,lda,x,a)
double resid = maxVal (abs(tol)); RESID = 0.0
double normx = maxVal (abs(x)); NORMX = 0.0
DO 30 I = 1,N
RESID = amax1 ( RESID,
ABS(b(i)) )
NORMX = amax1 ( NORMX,
ABS(X(I)) )
30 CONTINUE
double eps = epslon(1.0); eps = epslon (1,0D0)
double residn= resid /
(N*norma*normx*eps); RESIDn = RESID/(
N*NORMA*NORMX*EPS )
cout << "Normalized residual
= " << residn << NL; write (6,1000)RESIDn
cout << "Residual
= " << resid << NL; 1000 format (' Normalized residual
=', g16.7)
cout << "Machine precision
= " << eps << NL; write (6,1001)RESID
cout << "Factorization time
= " << t1 << NL; 100 format ('Residual
= ', g16.7)
cout << "Solution time
= " << t2 << NL; write (6,1002)eps
cout << "Total time
= " << total << NL; 1002 format (' Machine precision
= ', g16.7)
cout << "MFLOPS
= " << mflops << NL; write (6,1003)t1
} 1003 format (Factorization time
= ', g16.7)
write (6,1004)t2
1004 format (' Solution time
= ', g16.7)
write (6,1005)total
1005 format (' Total time
= ', g16.7)
write (6,1006)mflops
1006 format (' MFLOPS
= ', g16.7)
stop
end
DTestMatrix::DTestMatrix
(unsigned n) : DoubleGenMat (n,n) { subroutine matgen
(a,lda,n,b,norma)
double precision
a(lda,1),b(1),norma
long init = 1325; init = 1325
for (int j=0; j<n; j++){ norma = 0.0
for (int i=0; i<n; i++){ do 30 j = 1,n
init = 3125*init % 65536; do 20 i = 1,n
sub(i,j) =
(init-32768.0)/16384,0; init = mod(3125*init,65536)
} a(i,j) = (init - 32768.0)/
16384.0
} norma = amax1(a(i,j),
norma)
} 20 continue
30 continue
DTestRHS::DTestRHS(const DTestMatrix& a): do 35 i = 1,n
DoubleVec(a.rows(), 0.0) { b(i) = 0.0
35 continue
do 50 j = 1,n
for(int i=0; i<length(); i++) do 40 i = 1,n
(*this)(i) = sum(a.row(i)); (b)i = b(i) + a(i,j)
} 40 continue
50 continue
return
end
Total (nonblank) lines = 52 Total (nonblank) lines = 71
Executable size = 64868 bytes Executable size = 87482 bytes
(Borland C++ V2.0 w. optimizer & (Microsoft Fortran
8087, large memory model) V5.0 w. optimizer & 8087)
16 MHz 386 w. 80387: 16 MHz 386 w. 80387:
Normalized residual = 1.24482 Normalized
residual = 1.212636
Residual = .4973799e-13 Residual
= .4843265E-13
Machine precision = .2220446e-15 Machine
precision = .2220446E-15
Factorization time = 3.97 Factorization
time = 4.230000
Solution time = 0.13 Solution time = .170000
Total time = 4.10 Total time = 4.400000
MFLOPS = 0.122 MFLOPS = .1141364
First, note the use of inheritance to guarantee that the test matrix and the "right-hand side" of the sets of linear equations (DTestMatrix and DTestRHS, respectively) are set up correctly. While this uses a few extra lines of code, it recognizes these two objects for what they are: unique and "special," requiring a certain (and, in this case, intricate) initialization sequence. While we could have used the Fortran approach and created a "blank" vector and matrix to be passed onto a special function to be initialized, in a large project we might neglect to do this, risking an improper initialization. Yet because of inheritance, these special objects inherit all of the abilities of their underlying base classes.
Second, note how simple and more intuitive the calculation of a tolerance, residual, and norm becomes. Finally, note that despite the extra abilities of the C++ code in terms of type checking, dynamic memory allocations, and safe object construction, it still requires fewer lines of code. It also executes faster!
Up to this point, I've shown how the encapsulation and operator-overloading properties of C++ can give rise to an impressive and pleasing economy of expression for such "in the small" objects as vectors, matrices, FFT servers, and the like. This is useful, but not enough for working with big, gnarly projects where problems escape faster than they can be contained.
Suppose you wanted to model the motions of a vibrating string under the influence of a spatially and temporally varying force of (as yet) unknown origin. Figure 3 shows the governing equation, where u is the string displacement, x is space, t is time, c{2} is the string tension over the string density, and F(x,t) is the external force applied to the string.
How might we model such a problem? Example 8 shows one solution. Here's how abstraction of the problem is enforced, line by line.
class Force: // 1
class String { // 2
public:
String(double length, double tension, double density); // 3
void setPoints(int nx); // 4
void timeStep(double dt, const Force& force); // 5
DoubleVec displacement() const; // 6
private:
DoubleVec u; // 7
double cSquared;
double length;
};
Now look at the abstract class Force. A fundamental assumption is that we do not know much about its nature. Indeed, it may even depend on the string displacement. (For example, the force of the wind on a bridge depends on the displacement of the bridge.) See Example 9(a).
That's it. You can use inheritance to define the actual details of the force. For example, a wind-type force acting on the string might look like Example 9(b).
You explicitly recognize that the force of the wind depends on the displacement of the string by requiring that a specific string be used in the constructor (as well as on a drag coefficient). This WindForceString object will then track the string, asking it for its present displacement, before calculating the resultant force of the wind on the string and returning it. This is done by calling the virtual function value(), an example of polymorphism (a fancy word for runtime binding). Example 9(c) shows how value() might be implemented.
(a)
class Force {
public:
virtual DoubleVec value() = 0; // 1
};
(b)
class WindForceString : public Force {
public:
WindForceString{ String& string, double dragCoeff );
void setVelocity(double windspeed);
virtual DoubleVec value(); // 1
private:
String& myString; // The string
we are tracking
double wind; // Present wind
speed.
double drag;
};
(c)
DoubleVec WindForceString::value()
{
// Get present string displacement;
DoubleVec d = string.displacement();
return - drag*wind*wind*d; // Some (bogus) calculation
}
It then becomes trivial to replace the forcing function, even at run time, with another type of force. This is more than just passing a generic vector of doubles to the String that represents the forcing function. You can have an actual object, complete with feedback loops to the string, act as the forcing function.
C++ has tremendous potential in numerics, one that has gone largely unnoticed by fans of object-oriented programming, perhaps because previous OOP languages lacked the efficiency required to do numerics. C++ has this efficiency--and a lot more.
Copyright © 1992, Dr. Dobb's Journal