June 1993/Curve Fitting with Extrapolation

Scientific and Numerical Applications

Curve Fitting with Extrapolation

Lowell Smith

Lowell Smith is a retired aerospace engineer (Hercules Aerospace Company). He first programmed in FORTRAN in 1962 on the IBM 1620, then later on the IBM 1130 and various IBM 360/370 models. After getting an IBM PC in 1982, he began using Pascal and C. He has never worked as a programmer, but has programmed many engineering applications over the years. He can be reached via 73377.501@compuserve.com.

Introduction
Engineering models frequently require functional representations of empirical data that may be defined at arbitrary intervals. A least-squares fitted polynomial provides a simple approach to the problem, but may deviate from expected values between data points, particularly for polynomials of higher degree.
Chebyshev polynomials can be used for a fit which very closely approximates the optimal minimax solution — minimizing the maximum deviations. Algorithms for implementing the method are well documented in Numerical Recipes in C (Press et al., 1992). Prince (December 1992) describes the use of this method for fitting analytical functions such as sin(x), cos(x), etc.
The use of Chebyshev polynomials requires evaluation of the candidate function at specific values of x. This is no problem for continuous analytical functions, but engineering data to be fit are usually available at only a limited number of points.
The algorithm described in Numerical Recipes in C for interpolation/extrapolation using rational functions provides a means for accurately estimating data at the abscissa values required for use of Chebyshev polynomials. It has the added benefit that it performs surprisingly accurate extrapolation for a wide variety of data.
The RPFT code described here is based on the algorithms defined in Numerical Recipes in C. It fits both polynomials and rational functions to arbitrary sets of (x, y) data points. The rational function has the form:

The extrapolation feature permits fitting somewhat beyond the limits of the available data.

Operation
The program requirements are defined in the help screen shown in Figure 1. The help screen is displayed whenever a command-line error is encountered and when the program is invoked with no command-line arguments.
Checking is provided to trap most input errors. Optional keyboard input for calculating test results is permitted at the end of the run. Those data are echoed to the screen via stderr to avoid confusion when stdout is redirected.
To illustrate the use of the program with extrapolation beyond the available data, I fit a fifth degree polynomial to the heat of formation data for HCl in water. The input data shown in Listing 1 was taken from Wagman, et al. (1992) with some data omitted. The logarithmic transformation of x values was used because of the extended range of the data. The upper limit for the curve fit was chosen well beyond the included data to illustrate the use of extrapolation. A screen dump for the run is given in Figure 2.
The resulting polynomial is plotted in Figure 3 along with a conventional least-squares polynomial. Data points not included in the analysis are shown to verify the effectiveness of the extrapolation.
The maximum error is 1.03 at x = 1.5. For a better fit I would use two polynomials, possibly with overlapping ranges, particularly if reasonable extrapolation beyond the lower limit were needed.
Rational functions often give a better fit and also can be extrapolated for moderate distances beyond the range over which they are fit. For the second example, I fit a rational function to sin(x) for x between 0 and 3.14. The input data at intervals of 0.2 (except the last point) were provided with 16 significant digits. The option -DIG = 16 was used to get maximum accuracy in the coefficients. Fourth order polynomials were specified for both numerator and denominator (NN = 4 and ND = 4).
The results are shown in Figure 4. Note that the error curve reflects the minimax nature of the fit between the lower and upper limits. Also, the rational function extrapolates reasonably well beyond the limits. The error at x = -0.5 is 0.00584 and that at 3.5 is 0.000031.
The maximum error is 0.28 for x = -3, which is well outside the limits of the curve fit. When the analysis is redone with limits of -3 and +6 to take advantage of the extrapolation feature of the code, the maximum error is 0.00328 between those limits.

Code Structure
The code is given in Listing 2. It is normally compiled on a PC in the small model. It will compile and run without error in the large model, but that should never be necessary. It will accept over 1,000 data points using the small model on a machine with 640KB memory. The allocation failure error messages indicate the location of the error in the code if problems are encountered.
All program options are specified on the command line as indicated in the help screen shown in Figure 1. main first calls commd to parse the command line and check for obvious errors.
The inpt routine is then called to read the (x, y) data from the input file and check for obvious errors such as missing values or negative values with logarithmic transformation. inpt then calculates either the Chebyshev polynomials or the abscissa and function values required for the rational polynomial fit. The diagonal rational function interpolation routine ratint is used for interpolation or extrapolation to determine values between or beyond data points as required.
The arrays required for the input data are freed prior to return. If NN is greater than 0, ratlsq is then called for the rational function analysis. The number of abscissa values used is npt, which is NPFAC times the total number of coefficients. Numerical Recipes in C recommends that you #define NPFAC to be 8, and that is used here. In some cases a better fit might be possible with a larger value.
The analysis proceeds by iteration with the weighting factors wt[] and ee[] adjusted after each iteration to minimize the maximum error. Functions dsvdcmp for singular value decomposition and dsvbksb for back substitution are used for the solution in each iteration.
The estimated maximum absolute error is displayed after each iteration. The coefficient set returned is that for the iteration with the smallest estimated error.
If ND is zero, chebpc is called to transform the Chebyshev coefficients generated by inpt to the interval -1 to 1. pcshft then generates the polynomial coefficients. sprintf and sscanf are used to round the coefficients to the specified number of digits (default is 7).
For the polynomial fit, a single column of coefficients is listed. For the rational function, a column is listed for both the numerator and the denominator.
At the end of the run, optional input for trial x values is provided. This, like all error messages, is directed to stderr to avoid confusion when output is redirected.

References
Press, W.H., et al. 1992. Numerical Recipes in C, Second Edition, Cambridge, MA: Cambridge University Press.
Prince, Tim. December, 1992. "Tuning Up Math Functions," C Users Journal. Lawrence, KS: R&D Publications.
Wagman, D.D., et al. 1992. "The NBS Tables of Chemical Thermodynamic Properties: Selected Values for Inorganic and C1 and C2 Organic Substances in SI Units," Journal of Physical and Chemical Reference Data. Vol. 11, Supplement No. 2. National Bureau of Standards, Washington, CD 20234.