Fundamentals Of Curve Fitting

Steve Graham


Curve fitting is something of a misnomer; a better name might be "function finding" — the goal is to find a function that satisfies certain constraints. The constraints normally take one of two forms: a list of control points (samples from some process to be modelled), or a specific function (too difficult or costly to evaluate). These constraints mirror the two goals of curve fitting: interpolation and approximation.

Interpolation is the practice of using function values at some points (usually samples, with the actual function unknown) to estimate the value of the function at other points. If the function argument to be evaluated lies outside the range of the sampled points, the process is referred to as extrapolation. The simplest example is linear interpolation: find the two control points nearest the argument, assume the function corresponds to a straight line between the two points, and calculate the estimated function value. This may be familiar from interpolating log tables. Graphically, linear interpolation produces a "connect-the-dots" image from the control points.

Because of the cost of evaluating some functions, it is preferable to use a fast approximating function, as long as the approximation is sufficiently accurate. Since the original function is known, the implementor has more control over the number and placement of control points, as well as a standard to measure the accuracy of the fit against.

When fitting a curve to control points, there are two preliminary choices. The curve can be created from all the control points, or it can be created in segments, short curves between adjacent control points. The curve can be required to match the control points exactly or the curve may be allowed to diverge, creating local errors in the interest of minimizing global error.

For N control points, a unique polynomial of degree N-1 can match the points precisely. Lagrange's interpolation formula provides an expression for the polynomial

Equation 0

Suppose we want to find a quadratic polynomial through: (1,0), (2,1), (3,4). Lagrange's formula yields

which simplifies to

Spline techniques treat segments individually, producing a composite curve from the segments. A spline was a flexible ruler used in drafting. The spline was anchored to the control points (referred to as "knots"), allowing a smooth curve to be drawn. The functions fitting individual segments are generally simple. A popular choice is a cubic polynomial (hence, "cubic splines"), of the form

P (x) = ax3 + bx2 + cx + d

Cubic splines produce a relatively smooth curve, since adjacent segments are normally forced to have identical slope (first derivative) and curvature (second derivative) at the knots. Constraints may be stated by requiring continuity of the first and second derivatives of the composite function at the knots. The constraints and the knot coordinates, can be used to determine the coefficients for the fitting cubic polynomial. The first and last knots cause a problem, since there is no adjacent segment to constrain the values for the slope and curvature; other criteria must determine the coefficients for the end segments, such as the "natural" spline that sets the curvature at the first and last knots to zero.

Two popular measures of the quality of a curve fit are the least squares method and the minimax method. The least squares techniques seek to minimize the sum of the squares of the divergence between the approximating function and the control points or original function. Minimax techniques use the alternative aim of minimizing the maximum error between the new function and the source data or function.

Chebyshef (also, Chebyshev, Chebichev or Tchebycheff) fitting relies on a combination of Chebyshef polynomials to approximate the function. A Chebyshef polynomial has the form

Tn(x) = cos (n arccos x)
Note the recurrence relation between subsequent Chebyshef polynomials

To (x) = 1
T1 (x) = x
T2 (x) = 2x2 - 1
T3 (x) = 4x3 - 3x
Tn+1 (x) = 2xTn (x) - Tn-1 (x) n >= 1
Clenshaw's recurrence formula provides an efficient way fo evaluating such equations.

Aliasing And Optimization

When a programming language allows two apparently different access mechanisms to address the same data elements, this is referred to as aliasing. Having multiple names for the same objects is considered bad style and discouraged, since it adds to the complexity of understanding a program. The possibility of aliasing is almost impossible to eliminate from programming languages: subroutine calls with one data object used for two arguments and mapped to two different formal parameters can cause aliasing. The most common cause for aliasing problems, is access to data through a pointer, where two different references could access the same data.

Besides interfering with comprehension, aliasing becomes an optimization problem for subroutines. The simplest and most common optimization performed by compilers is the reuse of values. If X has already been loaded into a register for use by the previous statement, it needn't be loaded again. If an intermediate value (a subexpression), say B*B - 4*A*C, has been calculated once, it needn't be calculated again. Two problems can interfere with reusing values: unsufficient registers to hold the values and possible changes to constituent variables. The available registers are a simple consequence of the CPU's architecture and the compiler's conventions for register use. Possible changes arise because of aliasing and side effects. The compiler must guarantee correctness, so if aliasing is possible and a write occurs, saved values must be flushed, since they are no longer reliably accurate. Similarly, if a function call occurs that could cause side effects — even if the function call mechanism preserves register contents — the saved values are no longer necessarily correct.

In practice possible aliasing and function calls should be a concern in tight, inner loops that are executed repeatedly. Inline code and careful statement ordering can permit compilers to perform optimizations that otherwise might introduce errors.

The term "aliasing" is also used in graphics and sampling to refer to artifacts created by the granularity of the sample or representation; such aliasing may create jagged edges in images.