January 1990/Fitting Curves To Data

Features

Fitting Curves To Data

Michael Brannigan

Micheal Brannigan is President of Information and Graphic System, IGS, 15 Normandy Court, Atlanta, GA 30324 (404) 231-9582. IGS is involved in consulting and writing software in computer graphics, computational mathematics, and data base design. He is presently writing a book on computer graphics algorithms. He is also the author of The C Math Library EMCL, part of which are the routines set out here.
Fitting curves to data ranks as one of the most fundamental needs in engineering, science, and business. Curve fitting is known as regression in statistical applications and nearly every statistical package, business graphics package, math library, and even spreadsheet software can produce some kind of curve from given data. Unfortunately the process and underlying computational mathematics is not sufficiently understood even by the software firms producing the programs. It is not difficult, for example, to input data for a linear regression routine to a well known statistical package (which I shall not name) used on micros and mainframes for which the output is incorrect.
Constructing a functional approximation to data (the formal act known as curve fitting) involves three steps: choosing a suitable curve, analyzing the statistical error in the data, and setting up and solving the required equations. Choosing a suitable curve is a mixture of artistic sensibility and a knowledge of the data and where it comes from. Analyzing statistical error can be something of a guessing game and requires some thought. Setting up and solving the equations is computationally the most interesting. It is here that many programs fail because they use computationally unstable methods, but more of that later.
The number of methods for data fitting is legion and we suggest some in this article. However, we give only one method in full and consider only 2-D data. Anyone interested in other specific data fitting techniques may contact the author.

Problem
Given data points (x_i,y_i)i=1,...,n we suppose there exists a relation

y_i = F(x_i) + e_i, i = 1,...,n
where F(x) is an unknown underlying function and ei represents the unknown errors in the measurements y_i. The problem is to find a good approximation f(x) to F(x). We thus need a function f to work with and some idea, however minimal, of the errors.

How To Choose f
f will have variable parameters whose correct values (the values that solve the approximation problem) are found by solving a system of equations, each data point defining one equation. We call the function f linear or non-linear if it is linear or non-linear in its parameters.
Consider some of the general principles involved in choosing a suitable function f. We must have more data points than parameters, otherwise f will fit the data exactly and we will not model the errors. Unless absolutely necessary, don't use a non-linear f; solving systems of non-linear equations uniquely is, except for special cases, nearly impossible. In most cases polynomials are not a good choice; they are wiggly curves and nearly always wiggle in the wrong places.
The best option in most cases is to use piecewise polynomials. The example we give is a piecewise cubic polynomial such that the first derivatives are continuous everywhere.
(You can, of course, use cubic splines if you want second derivatives to be continuous, but in most cases the example set out here is superior for a general purpose curve fitting routine. If you want the full cubic spline, please use the B-spline formulation, no other, otherwise you get unstable systems of equations resulting in incorrect solutions. Using the B-spline formulation for spline approximation, you need only change the routine coeff_cubic() (Listing 1) in the program given in this article. The system of equations is solved by the same routines.)
Once f has been chosen and applied to each data point, we obtain a system of linear equations to solve, where the number of equations will be greater than the number of unknowns. Such a system is called an overdetermined system and no exact solution exists — one that is exactly what we want. However, overdetermined systems have an infinite number of inexact (approximate) solutions; we will seek an approximation that minimizes some particular error measure. (Mathematicians call these error measures "norms". Thus the problem of curve fitting becomes an optimization problem.)
Of the infinite possible norms three should be considered for any curve fitting package: the L1-norm, the L2-norm (least squares norm), and the minimax (Chebyshev) norm (These norms are defined later in this article.). Fortunately good algorithms exist for solving overdetermined systems of linear equations in all three norms. For the L1-norm and the minimax norm, you use a variation of the simplex method of linear programming; for the L2-norm you use a QR decomposition of the matrix in preference to the computationally unstable method of solving the normal equations. (We cannot give all the program code here as space is limited but for more guidance the reader can contact the author.)
Of many possible combinations the following solution is a good general-purpose option.

Solution
We have data points (x_i,y_i)i=1,...,n. Let each x_i belong to some interval [a,b]. Specify k points Xj j=1,...k, on the X-axis, we call these points knots. These knots are such that

a = X₁ < X₂ < … < X_k = b
We can now define our function as follows: for each x in the interval [Xj,Xj+1] define the cubic polynomial

y = [(d³ - 3dp²(x) + 2p³(x))Yj + (dp(x)q²(x))Yj' + (3d - 2p(x))p²(x)Yj+1 + dp²(x)q(x)Yj+1']/d³
where

d = Xj+1 - Xj p(x) = x - Xj q(x) = Xj+1 - x
Thus y is a cubic polynomial with the linear parameters Yj, Yj+1,Yj',Yj+1', which are the values and first derivatives at the knots Xj and Xj+1 respectively.
For each data point we obtain one linear equation so we can set up n linear equations in the 2k unknowns Y₁,Y₁',..Y, k,Y k'. In matrix form this can be written as

AY = b
where A is a block diagonal matrix, Y is the vector of unknowns, and b is the vector of y values. Because A is block-diagonal, for very large data sets optimal use should be made of the structured sparsity.
With the same knots we could also define cubic B-splines and then fit a cubic spline to the data. We would again arrive at an overdetermined system of linear equations with a matrix of coefficients having block-diagonal structure. In fact the equations we have set out above form a cubic spline with each knot Xj a double knot.

Choosing A Norm
For each possible solution Y we have errors s_i i=1,...,n such that

AY - b = s
where s is the vector of s_i values.
The L1-norm is defined to be

åabs(s_i) i
The L2-norm or least squares norm is

(ås_i²)^1/2 i
And the minimax or Chebyshev norm is

max(abs(s_i) :i=1,...,n)
We solve the overdetermined system of equations by finding that vector Y which minimizes one of these norms. The choice of norm depends on the unknown errors e_i and we hope that the choice of norm will give errors s_i that will mirror these unknown errors. The general rule is: choose the L1-norm if the e_i are scattered (belong to a long tailed distribution); choose the L2-norm if the e_i are normally distributed; choose the minimax norm if the e_i are very small or belong to a uniform distribution. Research has indicated that data sets have errors nearer to the L1-norm than the L2-norm. (Errors in data are never normally distributed, neither as they are nor in the limit. This assumption of normally distributed errors is common in most packages, the user should question this assumption very carefully.) So when you don't know how the errors are distributed, use the L1-norm. The minimax norm is rarely used for fitting curves to experimental data. However, always use the minimax norm if you want to fit a function to another function, for example fitting a Fourier series to a complicated function where you know the values exactly.
Whichever norm you choose, the computer solution of the equations is not straightforward. You must choose an algorithm that is computationally stable. (A computationally unstable algorithm is one that is mathematically correct but when fed into a computer, produces wrong answers. For example solving linear equations without pivoting, or solving quadratic equations from the well-known formula. So get some professional help in choosing the algorithm.)

Program
After you have spent some time analyzing your particular data fitting problem, decided upon a suitable function to approximate the data, and also decided upon the norm to use for the errors in the data, you must program the result. Unless your application requires special functions, then the approximating function set out above is a good general purpose function. The programming for this function or any other has the same form. The system of equations is set up with one equation for each data point, and then the system is solved with the required norm. For the function described here the programming is just as straightforward.
The main routine is Hermite(), named after the mathematician who defined these piecewise polynomials. The routine first gives the user the choice (by setting the variable flag) of either setting the k knots lambda[] on input or using the routine app_knots() to compute the knots. In most cases the user will never just use the routine once but compute a first approximation then alter the position of the knots for a second approximation. For a first approximation set flag to true and use app_knots() to compute the knots automatically. Then look at the result and choose new knots. A more sophisticated method automatically chooses the number of knots k and their position.
Once the knots are defined the routine allocates space for the matrix A of size nx2k. After making sure all elements of the matrix are zero, the routine calls coeff_cubic() to set up the coefficients of the matrix.
Now the program solves the overdetermined system in the appropriate norm. The variable norm is set by the user to indicate which norm to use. (We do not give here the three routines that solve the overdetermined system of equations as they require lots of space, but the reader can find the algorithms in most computational mathematics textbooks.) The routine L1_approx() uses the Ll-norm, the routine CH_lsq() uses the least squares norm, and the routine Cheby() uses the minimax norm. With the solution from the appropriate routine, the function now fits the data.
Some words on the other routines. First, the routine app_knots() will compute k knots lambda[j] so that in each interval (lambda[j], lambda[j+1]) there are approximately the same number of x values. This is a good starting point for our Hermite approximation and for any spline approximation that needs knots. The routine coeff_cubic() is merely a direct translation of the formulae. This routine uses interval(), which finds to which knot interval each x value belongs. coeff_cubic() also uses the macro SUB() to move around the matrix (this is my preferred method for dealing with matrices passed as parameters). Finally there is the routine app_cubic(). This routine uses the results from Hermite() to compute the piecewise polynomial for any value of x. Thus app_cubic() completes the curve fitting problem.

Example
An example (using data from actual measurements of the horizontal angular displacement of an elbow flexion against time) will show how the pieces fit together. There are 142 measured points and these measurements are quite accurate (the experimenters knew the kind of instruments they were using—see the paper by Pezzack, et al). In this instance a close fit to the data points is required. In all the figures the dark circles are the knots and the crosses are the data points.
The solution is in Figure 1. Figure 2 shows the result when the L2-norm is used. Figure 3 shows the result when the minimax norm is used. As would be expected with such "clean" data, the answers are all quite good, the best being Figure 1.
To illustrate the behavior in the presence of noise, add some significant errors to the same data points. Using the same curve approximation method, then Figure 4, Figure 5, and Figure 6 show the result when using the L1- norm, L2-norm, and minimax norm respectively. As theory suggests, the Ll-norm gives definitely superior results. This example is a straightforward application of the method set out here — well, nearly! You may be asking the six thousand dollar question, "How do I choose the knots?" The answer is not straightforward and contemporary research has different answers.
As you can see from the figures, the number and position of the knots changes for each example. The goal is to choose the number of knots and their position so as to give the best fit possible for the norm chosen—easy to say but not easy to compute. All the knots in each figure have been chosen according to an information theoretic criterion, plus a little experience on the placement of knots. The idea behind this method is to attempt to extract the maximum amount of information from the data points until only error remains. To do this we need a computable value for the amount of information contained in the errors s_i; we suggest using the Akaike Information Criterion. The routine changes the number of knots and their position until there is no more information in the errors. For those readers who wsh to go further into this problem, see the papers by Brannigan for a full mathematical treatment of this method, the information theoretic criterion, and an extension to multivariate data.

Bibliography
Pezzak, J.C. et al. "An Assessment of Derivatives Determining Techniques Used for Motion Analysis," J. Biomechanics, 10 (1977).
Brannigan, M. "An Adaptive Piecewise Polynomial Curve Fitting Procedure for Data Analysis," Commun. Statist. A10(18), (1981).
Brannigan, M. "Multivariate Data Modelling by Metric Approximants," Comp. Stats. & Data Analysis 2, (1985).