Peter is an assistant professor of computer science and mathematics at Rivier College in Nashua, New Hampshire.
From the factory floor to the stock market, just about any process that produces data can be modeled in quantitative terms. Once the model is built, you can go about experimenting with the system, trying to predict future behavior--and in many cases, successful prediction can mean making or saving a lot of money.
Neural networks are one approach to behavioral modeling, but require specialized knowledge of the neural models (of which there are dozens) along with an understanding of the subtleties of constructing and fine-tuning a network. These skills require years of training and experience and don't come cheaply. To some, it seems like black magic. My experiences with designing neural nets taught me that an intuitive "feel" for the data is just as important as training in the technology
Over time, other tools for intelligent decision-making, such as many expert-system shells, have evolved to where the subject-matter experts themselves are able to do a substantial amount of knowledge-base development. Rule-based languages such as Exsys and KBMS are intended to include the nonprogrammer as an active participant in the development team.
AIM, from AbTech Corporation, uses a different approach to the modeling problem, but one that is very similar, at least in concept, to the neural net (AbTech refers to the approach as "abduction technology;" thus its name). Each input parameter is placed into a polynomial, and different combinations of polynomials attempt to minimize the error between the derived output and the expected output. The result is a "polynomial network" that is straightforward, easily mastered by a non-expert, and the resulting algorithm runs far more quickly than a neural network.
It's natural to compare polynomial networks to neural nets, if only because of similar names and uses. The comparison is best made with the neural model known as "backpropagation," a supervised-training model that feeds back the error between the estimated and actual outputs.
Both are used to model nonlinear systems, and both use a network approach. Both use a training data set to create the network, and are tested and evaluated on a different data set. Even the result looks somewhat the same--at first glance, the polynomial in AIM can easily be taken for the more-complex operations that go on in a neural-net processing element.
Upon closer examination, however, similarities break down. The neural network uses a series of processing elements, arranged in layers, that combine and evaluate input data. AIM forms polynomials of individual input variables and cross-products of variables. It combines the results of a network of polynomials to produce an output.
The neural network is said to "train," based on the output error propagated back into each layer of the network. This error is used to adjust the weightings at each processing element in the network to produce a slightly different output for the next trial.
AIM, on the other hand, does no backward propagation. For each input variable, AIM computes the best fit between that input and the output using a simple, least-squares linear-regression function. This results in a series of linear functions that, individually, probably don't do a very good job of modeling the system. However, these linear functions are combined with one another to produce the polynomial network. AIM attempts to fit the training data to a number of polynomial networks, then analyzes the error to produce more refined models.
This combination is how the application supports nonlinear modeling. The final model is the combination of linear-regression functions that produce the least error, subject to certain constraints discussed shortly. Each polynomial equation may also contain cross-products from two variables, known as "doubles," or three variables, known as "triples."
One of the pitfalls of using any mathematical approach to modeling a system is that it is possible to reproduce the behavior of any system with a specific-enough model. The problem is that this model generalizes poorly, and is unable to model similar systems with any reliability. For example, a ten-degree polynomial can accurately model a system for which you have collected ten data points. However, if the ten data points are only a sampling of system behavior, completely modeling the behavior of those ten points makes the model less likely to accurately predict the behavior of the entire system, or of similar systems.
The same thing is true with neural networks, where it's possible to "overtrain" the network by reducing the acceptable mean-training error to extremely low levels. An overtrained network responds to spurious readings in the data, and does not generalize well past the original training data set.
To combat this, AIM uses a "complexity penalty" which is computed by KP=2CPMKS2/N where KP is the complexity penalty, CPM is the complexity penalty multiplier, K is the initial number of coefficients, N is the size of the training data set, and S2 is the sample estimate of the system variance.
The user can change the complexity penalty multiplier to produce a simpler or more complex network. The CPM defaults to 1; making the value larger increases the complexity penalty, which biases the software toward a less-complex network.
AIM also has two other adjustments that can affect the complexity of the model; see Figure 1. The first is the overfit penalty. The larger the value of the overfit penalty, the greater the number of more-complex polynomial networks that are not even considered. This makes the run time shorter and results in a more general model. The second factor is the carving limit. Setting this value high will enable AIM to automatically remove, or carve out, terms from individual polynomials that do not contribute much to the overall solution. This, once again, makes the overall model simpler.
In the article "Neural Nets for Predicting Behavior" (DDJ, February 1993), James Farley and I used a neural network approach to create a behavioral model of an electronic wind sensor. After hours of tinkering with neural network parameters, we achieved an average wind-speed error of 0.7 meters/second, and an average wind-direction error of 2.1 degrees.
I resolved to start with the same data sets we started with in that project, and use AIM, rather than neural nets, observing the differences in the process of model building. My first attempt with one data set produced an average wind-direction error of just over 14 degrees, or about the same as we had achieved with a neural net on the same data set. What wasn't the same was the amount of time it took to get to this point. The neural network required setting the number of processing elements, number of hidden layers, learning rule, transfer function, and a myriad of other factors. Total time to get to this point with neural networks: about two weeks. With AIM, it took ten minutes, plus the 96 seconds required to create the model using a 66 MHz 486.
Knowing where I was going to end up cut out a lot of the dead ends that went into the original neural-net model. Preparation of the data set and examination of the results took about half of the total time in the original project, and I was able to avoid much of a duplication of effort. While a good spreadsheet or statistical analysis package is still a necessity, AIM provides basic information on the result of the modeling process, as well as a basic data-viewing facility.
Even so, the time saved--and results achieved--by using AIM was impressive. The end result was that my AIM model produced an average error in wind speed of 0.75 meters per second, and an average error in wind direction of 2.7 degrees. This is slightly worse than the neural network, but was accomplished in the space of two afternoons.
My final polynomial network was about as complex as the complexity limitations allowed it to be; see Figures 2. This is because of the overall poor quality of the wind-sensor data, where the wind was filtered through a fine wire mesh surrounding the sensors. While it protected the sensors, this mesh also caused currents and eddies that confused any traditional attempt at modeling. I should note that the final neural-net model was also highly complex, with 3 hidden layers and 25 processing elements.
Like backprop neural networks, AIM can be used to model classification problems. Many existing statistical techniques exist for classifying data, including factor analysis, cluster analysis, and discriminant analysis. In addition, AIM itself uses linear regression as a starting point for its modeling process.
All of these are powerful techniques in exploratory data analysis, but offer only linear solutions. It is, of course, possible to make the data input into any of these techniques nonlinear, simply by adding a nonlinear transformation. However, any way of adding nonlinearity to a conventional approach assumes that you know the structure of the underlying, nonlinear model, something that neither AIM nor neural nets require.
Since AIM produces networks of polynomials, these are relatively easy to translate into C code. AIM does this automatically, producing generic but very well-commented C with the polynomials encoded (see Listing One, page 96). The user can choose the form of the input and output variables (individual variables read from, and written to, files, input and output with arrays, or input and output by use of external references), data type (integer or single- or double-precision reals), and the inclusion of debugging information. The code generated from all of the networks I created compiled without error using Turbo C++.
The code generated from AIM is also considerably simpler, running under 4 Kbytes, versus more than 10 Kbytes for the neural-net solution. An examination of the code shows that virtually all of this difference is in the definition of the polynomials in AIM versus the processing elements in the neural net. There's more complexity in each processing element, and more processing elements than polynomials.
The most disappointing thing about AIM is that, aside from setting the complexity penalty multiplier overfit penalty, and carving limit, the user has little input into the model building process. Out of necessity, this limits the size and scope of the model. In this case, AIM limits the size of the polynomials produced to three degrees, although the network approach also lets the model consider cross-products.
In even a single type of neural network, the user can constantly fiddle with the number of hidden layers, the number of processing elements in the hidden layers, the learning rules and transfer functions, the momentum, and a large number of other variables. Further, there are no hard and fast rules for making these changes. The selections are a result of equal parts experience, intuition, and an understanding of the underlying structure of the neural model.
As a data analyst, I found the process of adjusting the neural-network parameters to be exciting and challenging. They were also very time consuming. In my efforts, I produced polynomial models that were almost as good as the neural-net model in a fraction of the time.
Does this mean that all, or even most, prediction and classification problems should be solved with AIM's polynomial networks, rather than neural networks? Probably not. I was, after all, able to achieve somewhat better results on the wind sensor model using neural nets.
The question becomes one of degree of accuracy vs. cost of getting there. AIM is usable by just about any educated professional, and there is little need for intensive training or years of experience. However, in most cases it will only get you so far. Additionally, the lack of backward propagation of errors means that the models are not self-adjusting. With a more restrictive modeling technique, and no feedback, you'll only get the best model it is capable of producing, rather than the best possible.
While my comparisons have been with the backpropagation, neural-network model, there are other classes of neural networks loosely categorized as "unsupervised learning models." These models don't require output data to compare with their own computed outputs, so there is no error to propagate backward.
What then, do these models do? For the most part, they classify. Working in a manner similar to discriminant analysis or factor analysis, they look for outputs that logically group together based on some nonlinear combination of the inputs. Some of my current research is leading me to use unsupervised neural-net models to classify computer network traffic into fuzzy categories such as Lightly, Moderately, or Heavily loaded. AIM is only peripherally useful to this effort.
The conclusion, then, is that the neural-network model is more powerful and flexible, but anyone who has developed a neural network can tell you that it is hardly plug-and-play. Before turning to an often costly and time-consuming neural-network development effort, a system designer would be better served by spending a few hours with AIM. If it produces results within specification, there's no need to add the complexity of a neural net. In how many problems is this the case? I don't know every problem in every field, of course, but I'd be willing to bet that AIM could be the answer to many that now depend on neural nets.
_MODELING SYSTEMS WITH POLYNOMIAL NETWORKS_
by Peter D. Varhol
[LISTING ONE]
#include <stdio.h>
#define pow1(x) (x)
#define pow2(x) ((x)*(x))
#define pow3(x) ((x)*(x)*(x))
#define LIMIT(v,mn,mx) ((v>(mx))?(mx):((v<(mn))?(mn):v))
/*******************************************************
ABDUCTIVE NETWORK SUBROUTINE: AIMnet()
Generated by AbTech's Abductory Induction Mechanism
INPUTS: the following 5 double(s):
Var_2
Var_3
Var_7
Var_4
Var_1
OUTPUTS: Pointer(s) to the following 1 double(s):
Var_8
STATISTICS:
Var_2 : Mean = 0.098811, Sigma = 1.01098
Min = -1.349, Max = 1.52
Var_3 : Mean = 3.18207, Sigma = 0.575878
Min = 2.007, Max = 4.154
Var_7 : Mean = 20.7932, Sigma = 0.940033
Min = 18.8, Max = 22.5
Var_4 : Mean = -0.228888, Sigma = 0.779203
Min = -1.39, Max = 0.809
Var_1 : Mean = 3.19586, Sigma = 0.589959
Min = 1.999, Max = 4.212
Var_8 : Mean = 180, Sigma = 105.501
: Min = 0, Max = 360
: KP = 33.0%, FSE = 67.0%
: Predicted Error = 31.8675
*******************************************************/
AIMnet(Var_2,Var_3,Var_7,Var_4,Var_1,Var_8)
double Var_2 ; /* input variable */
double Var_3 ; /* input variable */
double Var_7 ; /* input variable */
double Var_4 ; /* input variable */
double Var_1 ; /* input variable */
double *Var_8 ; /* output variable */
{
double node2 ; /* working variable */
double node3 ; /* working variable */
double node7 ; /* working variable */
double node12 ; /* working variable */
double node4 ; /* working variable */
double node11 ; /* working variable */
double node10 ; /* working variable */
double node1 ; /* working variable */
double node9 ; /* working variable */
#ifdef DEBUG
printf("AIMnet: received Var_2 = %g.\n",(double)Var_2);
printf("AIMnet: received Var_3 = %g.\n",(double)Var_3);
printf("AIMnet: received Var_7 = %g.\n",(double)Var_7);
printf("AIMnet: received Var_4 = %g.\n",(double)Var_4);
printf("AIMnet: received Var_1 = %g.\n",(double)Var_1);
#endif /* DEBUG */
/* node2 -- Var_2 */
node2 = -0.0977376 + 0.989137*LIMIT(Var_2,-1.349,1.52);
/* node3 -- Var_3 */
node3 = -5.5256 + 1.73648*LIMIT(Var_3,2.007,4.154);
/* node7 -- Var_7 */
node7 = -22.1196 + 1.06379*LIMIT(Var_7,18.8,22.5);
/* node12 -- Triple */
node12 = 0 - 0.397622 - 1.28001*node2 + 0.214025*node3
- 0.790493*node7 + 0.241116*pow2(node2)
+ 0.232504*pow2(node3) + 0.154497*node2*node3
- 0.132984*node3*node7 - 0.241643*node2*node3*node7
+ 0.475993*pow3(node2) + 0.0166738*pow3(node3)
+ 0.259022*pow3(node7) ;
/* node4 -- Var_4 */
node4 = 0.293746 + 1.28336*LIMIT(Var_4,-1.39,0.809);
/* node11 -- Double */
node11 = 0 + 1.52034*node12 + 0.491108*node12*node4
- 0.614602*pow3(node12) ;
/* node10 -- Double */
node10 = 0 + 1.43086*node11 + 0.562305*node11*node4
- 0.543749*pow3(node11) ;
/* node1 -- Var_1 */
node1 = -5.41709 + 1.69503*LIMIT(Var_1,1.999,4.212);
/* node9 -- Triple */
node9 = 0 + 1.29422*node10 - 0.16041*node10*node1
+ 0.190084*node10*node3 - 0.182997*pow3(node10) ;
/* node8 -- Var_8 */
*Var_8 = 0 + 180 + 105.501*node9 ;
/* perform output limiting on Var_8 */
*Var_8 = LIMIT( *Var_8, 0, 360 );
#ifdef DEBUG
printf("AIMnet: returning Var_8 = %g.\n",*Var_8);
#endif /* DEBUG */
}
Copyright © 1993, Dr. Dobb's Journal