November 1999/A Concise Method For Evaluating Derivatives

Scientific

A Concise Method For Evaluating Derivatives

Ronald E. Huss, Mark A. Pumar, and Robert L. Rudin

Is your differential calculus a bit rusty? Here's a class that can lend a hand.

Introduction

The need to evaluate derivatives is often encountered in solving a wide variety of scientific and numerical problems. Applications and algorithms for resolving nonlinear algebraic equations, two-point boundary-value problems, integral equations, and nonlinear differential equations (to name a few), usually require evaluating the first, second, and sometimes higher derivatives of the equations of interest. In many cases, these equations are expressed in terms of sums and products of the standard transcendental functions: sine, cosine, logarithm, exponential, etc. Except for the most trivial cases, finding symbolic expressions for the first- and, especially, second-order derivatives is a time-consuming and error-prone task.

With the advent of sophisticated symbolic manipulation packages, the difficult task of finding expressions for derivatives has been ameliorated by use of software. However, transferring and translating the symbolic outputs of such applications into algorithmic code — in C, for example — is still quite error-prone. Even when completed and debugged, the resulting code often contains multiple lines of code containing the sums and products of numerous math library functions for each derivative. Such code is usually, at best, less than illuminating when related back to the defining problem. At worst, it is very difficult to modify and maintain.

Applying the chain and product rules of calculus [1] greatly simplifies the evaluation of complicated derivatives. The chain rule relates the derivative of a composite function, f(x) = g(h(x)), to the derivatives of the individual functions:

f' = dg(h(x))/dx = (dg/dh)(dh/dx)

where f, g, and h are functions and x is the independent variable. The product rule relates the derivative of the product of two functions f(x) = g(x)h(x) to the sum of the products of the functions and their derivatives:

f' = gh' + hg'

Instead of finding a symbolic expression for the more complicated composite function, the chain and product rule can be employed to express this derivative in term of the sums and products of the derivatives of the simpler constituent functions. This fact was first realized by Wengert [2] and later by Kalaba et. al. [3-5] who refined and demonstrated this result in FORTRAN and BASIC. Kalaba gave the name FEED, an acronym for Fast and Efficient Evaluation of Derivatives, to this technique. Unfortunately, languages like FORTRAN and BASIC cannot fully take advantage of the power of these rules. The symbolic derivatives of the constituent functions must still be determined and explicitly placed in the code. This process is error prone and obscures the underlying equations. Consider the simple case:

f(x) = xe^x
Let g(x) = x and h(x) = e^x. Then applying the product rule:

f' = gh' + hg' = xe^x + e^x

Because the derivative of the exponential function is itself the exponential function, it is not immediately obvious which terms in the above expression correspond to g, h, g', and h'. If f is modified to include a minus sign in the exponential, f(x) = xe^-x, then how is the expression for its derivative affected? This problem is further compounded if the expression for the derivative is regrouped:

f' = (x + 1)e^x

Clearly, these problems are magnified as f becomes more complex. With languages like FORTRAN, all of these difficulties are either directly or indirectly attributable to the inability to overload the fundamental operations of addition and multiplication. C++ removes this difficulty, and makes it possible to write a small class that encapsulates the basic properties of derivatives by overloading the arithmetic operators. This class, called CFeed, allows complex derivatives to be expressed conveniently and naturally within the source code of an algorithm or analysis program. In addition, it provides an excellent and meaningful example of operator overloading in C++.

The CFeed Class Design

The CFeed class is designed to encapsulate the expression of complex derivatives in terms of the derivatives of the simpler fundamental math functions. These functions include the trigonometric functions (sine, cosine, tangent) and their inverses (arcsine, arccosine, arctangent), the hyperbolic trigonometric functions (sinh, cosh, tanh), the exponential functions (exp, pow) and their inverses (log, log10), and the square root function. In addition, the behavior of the derivatives of these functions is also encapsulated by the class.

The class determines derivative behavior by applying the product and chain rules of calculus. If f, g, and h are represented by CFeed objects and f = gh or f = g(h), then the derivatives of f are determined automatically by the product or chain rule. The CFeed class encapsulates this behavior via operator overloading so that the mere inclusion of the statement:
f = g*h;  
or
f = g(h);
in the source code yields all the derivative for the function f. This is accomplished by overloading not only the arithmetic operators (+, -, *, /) but also the assignment (=) and call (()) operators. In addition, class CFeed declares the fundamental math library functions as friends, so that statements like:
g = exp(x);
and
h = sin(x);
are valid and correct within the source code. This permits the natural and convenient expression of complicated functions (and their derivatives) in terms of the basic functions contained in the math library.

The CFeed Class

The CFeed class consists of four data members, two constructors, a destructor, a size function, and overloaded operators as shown in Figure 1. The transcendental math functions in the standard math library (<math.h>) are declared as friends of this class. With the exception of its constructor, destructor, and size function, all the member functions of this class are overloaded operators. The CFeed class supports extension through inheritance by declaring all data members as protected and all function members as public. It supports output access to data members through operator() and operator<<. Input access is provided only through constructors. We have compiled and tested the source code under Windows NT, Windows 95, and Unix. We used both Sun and SGI machines for Unix testing.

CFeed Data Members

The CFeed class contains four data members:

m_N represents the number of independent variables;

m_f (type unsigned long) holds the base function value;

m_df (type double *) holds the first partials;

m_ddf (type double *) holds the second partials.

As explained in the conclusion, CFeed does not include derivatives of order three or greater. The "size" of a CFeed variable is the number of independent variables in the encapsulated function. Since the number of independent variables determines the number of first- and second-order partial derivatives, m_N gives the sizes for the first- and second-partial derivative arrays, m_df and m_ddf. The size of these arrays are m_N and m_N*m_N, respectively. Strictly speaking, the size of m_ddf need only be m_N*(m_N+1)/2 since the order of the derivatives in the second partials is usually unimportant; however, m_ddf is allocated with length m_N*m_N to simplify the code and provide for the more general case. The additional memory required is insignificant for most practical applications, in which m_N is unlikely to be much greater than four.

CFeed Function Members

The class function members consist almost exclusively of overloaded operators. These operators provide the core functionality of the class and consist of the standard arithmetic operators: addition, subtraction, multiplication, and division. In addition, operator() is overloaded to provide access to data members and to support, in a natural way, the chain rule. The remaining functions consist of constructors, a destructor, a size function, and equality operators, as well as non-member friend functions. Each function group is discussed in detail below. Figure 2 shows the significant member functions of the class for each of these categories. You'll find a complete listing at the CUJ web site listed on page 3.

Constructor/Destructor Group

Two constructors are defined for class CFeed. The first is the default constructor:
CFeed::CFeed
   (
    unsigned long N = 0,
    double f = 0,
    double *df = 0,
    double *ddf = 0
   );
This constructor supplies zero as the default assignment to each argument. The first argument is the variable size (i.e., the number of independent variables), the second is the base function value. The inputs df and ddf are pointers to arrays or allocated memory of size N and N squared, respectively. The default constructor checks that these pointers are non-zero and assigns their contents to m_df and m_ddf, which are internally allocated.

The constructor does not check for the successful allocation of memory for m_df and m_ddf; the code assumes that the C++ new_handler mechanism exists and has been properly initialized before any CFeed class instances are declared or allocated. This memory allocation strategy is employed throughout the code.

The second constructor is the class constructor:
CFeed::CFeed(CFeed &feed);
which initializes one CFeed class instance with another. Again, memory is internally allocated for m_df and m_ddf.

The class destructor frees memory allocated for m_df and m_ddf and zeros all the data members. Zeroing of data members is employed internally in the class to determine if memory has been allocated for the partial derivative arrays. This helps prevent attempts to access memory that has not been allocated in the class member functions. The size of these members can be altered explicitly or implicitly by most of the other operators in the class (see below).

The assignment operator is part of the constructor/destructor group; it is used to set one CFeed object equal to another. This operator deallocates the memory associated with m_df and m_ddf and reallocates memory to these pointers corresponding to the size of the assignment. The two constructors and operator= are the only three functions in the class that explicitly allocate memory. However, since these functions are used internally in most of the other operators, memory is implicitly allocated and deallocated in most the remaining operators.

Arithmetic Group

This group consists of operator+, operator+=, operator-, operator-=, operator*, operator*=, operator/, and operator/=. Each operator is overloaded two ways to handle operations between CFeed objects and doubles. The operator+=, operator-=, operator*=, and operator/= contain the code that performs the core functionality for each operator; the remaining operators use these operators to accomplish their tasks in three to five lines of code. As such, the "X=" operators form the core of the arithmetic operator class code. This code is similar for each operator.

Arithmetic operations on CFeed objects result in a CFeed object whose size is equal to the size of the largest of the two operands. This reflects that fact that if h(x,y) is a function of two independent variables and g(x,y,z) is a function of three independent variables, then their sum or product, f(x,y,z), is a function of three independent variables. As such, the first task in any arithmetic operator is to determine the size of the result. This is accomplished by the following code fragment at the beginning of each of the core operators:
CFeed THIS(*this);
if(m_N < feed.m_N)
   *this = feed;
The first line defines a temporary CFeed object that is a copy of the calling object. The second and third lines change the size of the first operand by equating it to the second operand if the second operand is larger. The remaining code uses the temporary THIS object and the second operand, feed, to compute the result and store it in the enlarged space. Note the first and third lines in the code fragment involve the constructor and equality operator, which in turn allocate and deallocate memory using new and delete. Thus the core arithmetic operators implicitly manage memory.

Special Operations Group

This group contains two operators: operator- and operator(). The minus operator is overloaded to provided a unary minus or negation for the class. This is one of the few operators that does not involve memory management. It essentially consists of a double nested for loop that negates each data member in the class.

The chain rule is implemented using operator(). If g is a CFeed object of size one and h is a CFeed object of, say, size two, then f = g(h) is a CFeed object of size two and is the composite of g and h. Note: operator() assumes that its associated object is of size one; it uses only the first elements of the m_df and m_ddf arrays and ignores any additional values. A size zero object returns a copy of itself.

I/O Group

The I/O group also contains two functions. The member function size simply returns the size of the variable, m_N. The second function overloads operator() to provide access to the remaining class data members. If f is a CFeed object then:

f() returns the base value of the class, m_f;

f(i) returns the ith first partial, m_df[i], and;

f(i,j) returns the (ith, jth) second partial, m_ddf[i*m_N+j].

If either of the indices i or j is out of range, operator() returns zero.

Class Friend Functions

As a convenience, overloads are provided for all the transcendental functions in the standard math library (<math.h>). These overloaded functions are declared as friend functions of the class. With the exception of the pow function, each function takes a single argument, a CFeed object, and returns a CFeed object.

The transcendental functions are: exp, log, log10, sqrt, sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, and pow. The pow function is multiply overloaded to handle two CFeed objects and a CFeed object and a double. pow always returns a CFeed object. The atan2 function is not included as part of the CFeed math convenience functions.

None of the overloaded math functions check for inappropriate values (e.g., log(0)). However, the comments at the beginning of each function list the correct range of input values. Most employ

operator() to compute their result. The pow function requires special attention to the appropriateness of its inputs. In particular, the pow(double g, CFeed h) and pow(CFeed g, CFeed h) implementations utilize log(g) and log(h.m_f), respectively, in the calculation of the result. As such, g and h.m_f must be positive and non-zero in order to obtain valid results. The final friend function is operator<<, which is used to print the class data member values. Each member is printed on a single line and labeled for easy identification. This operator can be employed to print the object to either the screen or a file.

A Usage Example

A simple example illustrates usage of the CFeed class. We use a damped sine wave, for its familiarity and ubiquity, and the rapidity with which its derivatives become complicated. The function for a damped sine wave is:

f(t) = Ae^-ktsin(wt)

Here A is the amplitude of the wave, k is its dampening constant, and w is its frequency. The time is represented by the independent variable t. Because of the exponential term, both product and chain rule are required to compute the symbolic derivatives. This results in a first derivative that contains the sum of two terms similar to the original function and a second derivative that contains four such terms:

f'(t) = -kAe^-ktsin(wt) + wAe^-ktcos(wt)

f"(t) = k²Ae^-ktsin(wt) - wkAe^-ktcos(wt) - kwAe^-ktcos(wt) - w²Ae^-ktsin(wt)

Of course, these expressions can be simplified by factoring out the exponential and grouping the sine and cosine terms, but then the relationship to the generating function is obscured. With the CFeed class, computing f(t) and its derivatives is as simple as writing the function in the source code. This is shown in Figure 3, which contains a small program for calculating f(t) and its derivatives.

After declaring two CFeed variables, performing some constant initializations, and opening the output file, the program contains just three statements. The first statement:
t = CFeed(1, (double) i, &dt);
converts the loop counter into a CFeed time. Note that since f(t) is a function of only one variable, t, the CFeed time is initialized with a size of one, a base value equal to i's initial value, a first derivative of 1, and a second derivative of zero (via the default argument). The first derivative of the independent variable with respect to itself is always one and the second derivative is always zero, as indicated in this statement. Once the time is properly constructed, f(t) itself and its derivatives are computed in the second statement:
f = A*exp(-k*t)*sin(w*t);
Here the overloaded exponential and sine functions declared as friends to the class are utilized to calculate f(t) and its derivatives. The overloaded multiplication operator is also used to multiply CFeed variables and CFeed objects and doubles. The final statement prints the results to the output file. The overloaded function call operator is used here to retrieve the value of the function and its first and second derivatives. Figure 4 shows a plot of the output values generated by running this program.

The contrast between Figure 3 and a program using the more traditional approach of including the derivatives directly into the source code is significant. Such a program would include the expressions for the derivatives given above. Most likely these expressions would be compacted by regrouping; however this still leaves several difficulties. First, these expressions must be transferred into the source code without error. This is not insignificant, due in part to the number of terms involved, and to the flattening of the symbolic notation. This is especially apparent with the exponential term, e^-kt, which becomes exp(-k*t). The loss of superscripting flattens this expression in the source code, making it harder to visually correlate with the symbolic original. Although minor, this effect is cumulative and adds to the difficulty of coding, debugging, and maintenance as the complexity of the expression increases.

The second difficulty arises when modifying the code. If the definition of f(t) changes, then its derivatives must be recomputed and reinserted into the source code. Even minor alterations like changing the sine to a cosine presents significant work to the user of such a program. This leads directly to the last difficulty: the derivatives must be symbolically computed (and recomputed for changes) before the source code can be written. Although this task is relatively easy with the appropriate software tools, it does require access to these tools every time changes are required. With the CFeed class, these problems vanish. In the case of changing the sine to a cosine, all that is required is to, well, change the sine to a cosine in the source code! Changing f(t) completely presents no greater difficulty than merely writing the new expression into the code.

Conclusion

As noted above, the CFeed class facilitates the calculation of derivatives by encapsulating their basic properties and behaviors. This class works with functions having any number of independent variables and with the functions in the standard math library. That said, several points are noteworthy. First, the class limits derivative calculation to second order. There are several reasons for doing this. First, generalization of the code, although possible, significantly complicates the class member functions and obscures the underlying process (chain and product rules). Second, for most scientific and numerical applications, derivatives beyond the second order are rarely encountered. This is evidenced by noting that majority of differential equations encountered in Physics are second order or less. Finally, the class can be extended through inheritance to calculate derivatives of higher order if required.

This leads to the second point. The CFeed class is limited to the functions in the standard math library. Clearly, other functions worthy of inclusion do exist. Perhaps most noteworthy are Bessel functions and the like often encountered in scientific applications. Inheritance would play a major role in extending this class to include such functions when required.

Finally, it should be noted that the class does not require the explicit use of known functions. It handles empirical functions equally well if their derivatives are known. This could prove useful in applications such as navigation and flight control, in which instrumentation measures derivatives such as acceleration, angular acceleration, and stability derivatives of an aircraft.

References

[1] G. Sherwood, and A. Taylor. Calculus (Prentice-Hall, New Jersey, 1954).

[2] R. Wengert. "A Simple Automatic Derivative Evaluation Program," Communications Of The ACM 7, (463-464), 1964.

[3] H. Kagiwada, R. Kalaba, N. Rasakhoo, and K. Spingam, Numerical Derivatives And Nonlinear Analysis (Plenum, New York, 1986).

[4] R. Kalaba, N. Rasakhoo, and A. Tischler, "Nonlinear Least Squares Via Automatic Derivative Evaluation," Applied Mathematics And Computation 12, (119-137), 1983.

[5] R. Kalaba, and A. Tischler, "Automatic Derivative Evaluation In The Optimization Of Nonlinear Models," The Review Of Economics And Statistics 66, (653-660), 1984.

Ronald E. Huss is a consultant and former employee of Hughes Aircraft Co. in El Seguendo, CA. He has a Ph.D. in bio-medical engineering. His research interests include solving complex optimization problems involving radar and other sensor systems.

Mark A. Pumar is a Senior Software Engineer of Raytheon ITSS in Pasadena, CA. He has MS degrees in Physics and Electrical Engineering from UCLA. Mark is currently working on the ground data processing system for the shuttle radar topography mission at JPL. Mark can be reached via email at: map@sdsio.jpl.nasa.gov or MAPar@aol.com or by phone at +1.626.744.5406.

Robert L. Rudin is a Senior Software Engineer at COMPTEK Federal Systems Inc. in Camarillo, CA. He has a BA in Math from UCLA. Bob is currently working on ICAP III for the EA-6B aircraft. He can be reached via email at brudin@comptek.com.