This article contains the following executables: CASEY.ARC
Casimir C. "Casey" Klimasauskas is the founder of Neural-Ware Inc., a supplier of neural network development systems and services. Prior to that he worked extensively in machine vision and robotics. He can be reached at Penn Center West IV-227, Pittsburgh, PA 15276; 412-787-8222.
One of the key areas in which neural networks and expert systems differ is in how they internalize the information which they contain. In traditional expert systems, information is kept in the form of rules. When an expert system arrives at a conclusion, you can ask it to look back through its database of rules and find out which ones were activated in order to reach the current conclusion. In the case of a neural network, this kind of explanation is not so easy to obtain -- because neural nets store information in the form of "weights" or "synaptic strengths." Weights in a network have numerical rather than symbolic values, making it hard to elicit a symbolic explanation.
But it is possible to learn more about how the various inputs affect the current output of a neural network. This is accomplished by applying a technique from the field of nonparametric statistical analysis called "sensitivity analysis."
This article describes sensitivity analysis and how it can be applied to "explaining" the thinking in a neural network. An example, complete with C code, is provided and described in depth. The example is a complete, self-contained neural net processor implementing the Back-Propagation model. This article assumes you have some familiarity with neural nets and with the Back-Propagation model. (For more information, see "Untangling Neural Nets" by Jet Lawerence, DDJ April 1990.)
Sensitivity analysis looks at each of the inputs and determines how much and what kind of impact a small change in inputs would have on the output. If a small change in a particular input causes a drastic change in a particular output, that input might be considered one of the key factors in the current outcome.
The actual process of applying sensitivity analysis consists of starting with the first input to the network and adding a small amount to it, recomputing the outputs, subtracting a small amount from the network input, recomputing the outputs, then taking the difference in the two outputs (the change in the output) and dividing it by the amount the input changed (in total). This results in M values, one for each output. This process is repeated for each of the inputs to the network. The result is a MxN matrix where M is the number of outputs, and N is the number of inputs. The small amount the input is changed is called the "dither."
As an aside, this basic technique is used internally in a variety of neural networks to determine which processing element should be modified. In particular, it is used in certain variants of the Madaline, and also in certain network optimization techniques.
As an example of how sensitivity analysis works, suppose a neural network has been trained to determine credit worthiness. The inputs to this network are information from a credit application, including income, expenses, number of credit cards, own/rent, and so on. For the current applicant, say the output of the network is 0.3 (<.5 is deny). Table 1 shows the effect of a small change in each input on the outputs.
Input New Output Change in Output
----------------------------------------------
Income up 10% .60 +.30
Income down 10% .15 -.15
Expenses up 10% .25 -.05
Expenses down 10% .40 +.10
rent --> own .31 +.01
Which of the inputs would you have to change least to switch the current "deny" (output = 0.3) to an "accept"? The easiest would be to require the applicant to increase his or her income by 10 percent, resulting in an "accept" (output = 0.6). As to why credit was denied, the primary reason is that the applicant's income is too low. From Table 1, it makes almost no difference if the person were to go from renting to owning a house. On the other hand, a secondary reason for denying credit might be that the applicant's expenses are to high. If expenses were reduced by 20 to 30 percent (rather than 10 percent), this might be enough to raise the credit evaluator output over the accept/deny threshold of 0.5.
It is very important to note that the way in which each of these input variables affects the output is dependent on their current values. For another applicant, the most important factor might be the rent/own status.
In many ways, this is analogous to how individuals face decisions. When it is difficult to make a decision between two courses of action, it is usually because there is one particular factor (though there may be more) which is right on the border line. If it were a little more one way or the other, the decision would be clear cut.
For example, my company (NeuralWare) offers an introductory software package for $99.00 as well as a sophisticated software package for $1895.00. Would you spend $99.00 for a software package to learn about neural networks? I would, almost without thinking (buy level = 1.0). Would I spend $1895.00? Probably not, unless I had a specific application and wanted to make sure I had the best tools possible to insure my success (education buy level = 0.1, application buy level = 0.8). Would I spend $200.00? Maybe (Buy level = 0.5). $500.00? Not unless I really, really wanted it (buy level = 0.3).
So, faced with the decision as to how much I would pay to purchase a software package to learn about neural networks, the higher the price, the less I would be willing to purchase it. Below $150.00, I would buy it without a thought. Over that, I need additional compelling motivation.
But if I had a particular application in mind, would I pay $1895 for the product? Yes, based on appropriate capabilities (buy level = 0.8). $3,000? Yes, again based on appropriate capabilities (buy level = 0.7). $500? Maybe, though I might want to look very carefully at product reviews (buy level = 0.5). Why would I feel more comfortable paying $2,000 or $3,000 for a product to solve a particular application? Because I know from experience that good tools cost money. Inexpensive ones often cost more in wasted effort than the money saved (with some notable exceptions).
Look carefully at Table 2. Notice that a product purchased for education is very price-sensitive when the price is over $150. When purchasing a tool for application development, above a certain price threshold, other factors become more important and price is not an issue. One key observation from this table is that the way in which price affects my willingness to purchase is dependent on my purpose for purchasing the product. This is true of sensitivity analysis in general. The resulting changes in the output are valid only for the current set of inputs. Because a small change in a particular input may have no effect on the output does not mean that particular input is unimportant. Combined with some other set of inputs, a small change in it may have a very large impact on the output.
Purpose Cost in $ Buy Level for 10% Change Sensitivity
in Price
--------------------------------------------------------------
Education 99.00 1.0 0.00 0.0
Education 150.00 1.0 -0.20 -2.0
Education 200.00 0.5 -0.10 -1.0
Education 500.00 0.3 -0.05 -0.5
Education 1,895.00 0.1 -0.01 -0.1
Application 99.00 0.1 +0.10 +1.0
Application 500.00 0.5 +0.10 +1.0
Application 1,895.00 0.8 -0.01 -0.1
Application 3,000.00 0.7 -0.01 -0.1
Listing One (page 78) shows a neural network program designed to train a Back-Propagation network and then have it "explain" itself using sensitivity analysis. A main routine selects which function to perform and allows the network and explanations to be printed to a file for later inspection. Because of space constraints, the complete network (network.net) is provided electronically along with the program source and executable (see "Availability," page 3). The program was written in Zortech C, Version 2.1, but should compile readily under Microsoft C or Turbo C.
The program is organized in a fairly straightforward fashion. Utility routines for loading, saving, deleting, creating, and printing the network are at the beginning. These are invoked directly from the main program. The routine Recall is used to perform a single execution of the network to determine its response to a set of inputs. Learn performs a single update of the network weights. The standard Delta-Rule with Momentum is used for training the weights (see Rummelhart and McClelland's Parallel Distributed Processing, volume 1, chapter 8 for more details). Train repeatedly invokes Learn, passing it examples from the static training set (testE[ ]). To insure that all of the data examples are presented, a "shuffle and deal" strategy is used. This is implemented in NextTestN, which creates a shuffled list of examples, deals it out to the test program, and reshuffles when the stack is empty.
(See the accompanying text box for details on running the program. Every precaution has been taken to make the program as easy to compile as possible under other C compilers. From experience, there should be no problem with Microsoft C or Turbo C compilers.)
The network was trained until the root-mean-square error was less than 0.001. Table 4(a) shows the output of the network over the range of inputs 1 and 2. This corresponds very well to the training inputs in the program (testE[ ] array). The training set was designed to make input 3 "random." If it were completely random, the weights connected to it should be close to 0. Table 3 is a printout of the network. Notice that the magnitude of the weights associated with input 3 (highlighted in the table) are almost all very small in comparison to the weights for the other inputs to each processing element. This means that input 3 will typically have little effect on the output of the network. Processing element 0 is the "bias" term, and its output is always 1.0.
Layer 0 with 3 PEs
1 1abb:5b12 PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=0
2 1abb:5b32 PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=0
3 1abb:5b52 PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=0
Layer 1 with 5 PEs
4 1abb:5b72 PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=4
Src= 0 1abb:00d6 Weight= 1.591 Delta Wt= 0.000
Src= 1 1abb:5b12 Weight= 3.830 Delta Wt= 0.000
Src= 2 1abb:5b32 Weight= -16.104 Delta Wt= 0.000
Src= 3 1abb:5b52 Weight= 0.486 Delta Wt= 0.000
5 1abb:5bba PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=4
Src= 0 1abb:00d6 Weight= 14.082 Delta Wt= 0.000
Src= 1 1abb:5b12 Weight= -15.446 Delta Wt= 0.000
Src= 2 1abb:5b32 Weight= -16.720 Delta Wt= 0.000
Src= 3 1abb:5b52 Weight= 2.472 Delta Wt= 0.000
6 1abb:5c02 PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=4
Src= 0 1abb:00d6 Weight= 3.883 Delta Wt= 0.000
Src= 1 1abb:5b12 Weight= 2.332 Delta Wt= 0.000
Src= 2 1abb:5b32 Weight= -20.745 Delta Wt= 0.000
Src= 3 1abb:5b52 Weight= 1.366 Delta Wt= 0.000
7 1abb:5c4a PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=4
Src= 0 1abb:00d6 Weight= -1.377 Delta Wt= 0.000
Src= 1 1abb:5b12 Weight= -3.519 Delta Wt= 0.000
Src= 2 1abb:5b32 Weight= -3.751 Delta Wt= 0.000
Src= 3 1abb:5b52 Weight= 0.545 Delta Wt= 0.000
8 1abb:5c92 PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=4
Src= 0 1abb:00d6 Weight= 4.869 Delta Wt= 0.000
Src= 1 1abb:5b12 Weight= -10.058 Delta Wt= 0.000
Src= 2 1abb:5b32 Weight= -8.766 Delta Wt= 0.000
Src= 3 1abb:5b52 Weight= 3.108 Delta Wt= 0.000
Layer 2 with 2 PEs
9 1abb:5cda PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=9
Src= 0 1abb:00d6 Weight= -20.434 Delta Wt= 0.000
Src= 1 1abb:5b12 Weight= 46.298 Delta Wt= 0.000
Src= 2 1abb:5b32 Weight= -2.895 Delta Wt= 0.000
Src= 3 1abb:5b52 Weight= -0.492 Delta Wt= 0.000
Src= 4 1abb:5b72 Weight= -21.351 Delta Wt= 0.000
Src= 5 1abb:5bba Weight= 20.866 Delta Wt= 0.000
Src= 6 1abb:5c02 Weight= -26.287 Delta Wt= 0.000
Src= 7 1abb:5c4a Weight= 1.304 Delta Wt= 0.000
Src= 8 1abb:5c92 Weight= 8.782 Delta Wt= 0.000
10 1abb:5d54 PE Output= 0.000 Error= 0.000 WorkR= 0.000 NConns=9
Src= 0 1abb:00d6 Weight= 20.434 Delta Wt= 0.000
Src= 1 1abb:5b12 Weight= -46.298 Delta Wt= 0.000
Src= 2 1abb:5b32 Weight= 2.896 Delta Wt= 0.000
Src= 3 1abb:5b52 Weight= 0.493 Delta Wt= 0.000
Src= 4 1abb:5b72 Weight= 21.338 Delta Wt= 0.000
Src= 5 1abb:5bba Weight= -20.865 Delta Wt= 0.000
Src= 6 1abb:5c02 Weight= 26.300 Delta Wt= 0.000
Src= 7 1abb:5c4a Weight= -1.292 Delta Wt= 0.000
Src= 8 1abb:5c92 Weight= -8.783 Delta Wt= 0.000
Table 4(b), Table 4(c), and Table 4(d) show the sensitivity of the output to small changes in the first and second variables, respectively. The tables range over the possible input values (input 1 along the x-axis; input 2 along the y-axis). Notice from these tables that the output is most sensitive when the particular variable is near a border. If you think about it a moment, this makes sense. In a region away from a border, small changes in the inputs will have little affect on the output. Only near a border, where the output changes from one state to another, will the appropriate input have a large impact. This suggests some modifications to the program. In particular, if a small change in each of the inputs produces little or no change in the outputs, increase the magnitude of the change in input until the change is significant.
In Table 4(b), which looks at the sensitivity of the output to a change in input 1, notice that the output is most sensitive when input 1 is in the range 0.45 to 0.55 and input 2 is in the range 0.7 to 1.0. This is where the output of the network makes the transition from 0 to 1. Likewise, in Table 4(c), the output is most sensitive to input 2 when input 2 is in the range 0.25_0.35 and 0.65 to 0.8. These are where the network makes a transition from 0 to 1 and 1 to 0 along the input 2 axis.
In looking at Table 4(b), Table 4(c), and Table 4(d), you will notice that the output is always sensitive to the input when there is a transition. However, the training set has very sharp boundaries and only one variable changes at each of them. This might lead you to suspect some kind of problem. Actually, the problem results from certain basic characteristics of how Back-Propagation creates its output. It takes a series of humps and fits them together to "construct" the resulting mapping from inputs to outputs. As a result, the borders are a little lumpy making the output somewhat sensitive to all of the inputs.
Another aside. Look back at the first example of credit approval. The changes to the input variables were as a percentage of the unpreprocessed inputs. Depending on how those input variables were transformed, this may have represented a larger or smaller change in one or more actual inputs to the network. In actual situations, it is important to vary the raw input data and measure the changes in the output based on that.
The program as supplied contains several opportunities for further exploration. As described earlier, dynamically making the dither larger provides interesting effects. To a certain degree, a level of confidence can be derived from how large the dither is. A small dither resulting in large sensitivity means a less confident conclusion than no sensitivity at all. In the program, the effects of a positive and negative dither have been averaged. Another modification would be to report the change in each direction separately.
Setting a higher error threshold on training results in fuzzier boundaries. This tends to make the network less sensitive to all inputs. It also creates additional "anomalous" regions where the "bumps" which Back-Propagation uses have not fit together.
The data set which is hard-coded into the program was designed specifically to illustrate how sensitivity analysis works. Try constructing other data sets which have fuzzier and nonrectangular boundaries.
Here are the steps in running the example program. The following sequence will create the network.exe program and replicate the tables shown in this article.
1. Compile the source program "network.c" (I used
Zortech C, Version 2.1, though Microsoft C or Turbo
C should work as well).
a. If you have a math coprocessor:
C> ztc - ms - f - o + all network.c
b. If you do not have a math coprocessor:
C> ztc - ms - o + all network.c
Due to intermediate round-off errors, there will be
slight differences in the operation of the program
created with inline floating point or software
floating point. The tables in this article were produced
with software floating point.
2. Run the program: C> network
3. Create a new network: What do you want to do? c
... network is created.
4. Train the network (this took 3 hours on a Toshiba
5200): What do you want to do? t ... displays
messages about the current "pass."
5. Save the network to "network.net" (default): What
do you want to do? s
6. Print out the network weights: What do you want
to do? p network.lst
7. Analyze how sensitive the network is to various
inputs: What do you want to do? e network.exp
8. Quit: What do you want to do? x
_NEURAL NETS TELL WHY_ by Casimir C. "Casey" Klimasauskas[LISTING ONE]
/* network.c -- Backprop network with explain function */
/************************************************************************
* *
* Explain How a Neural Network "thinks" using Sensitivity *
* Analysis *
* *
************************************************************************
This is a program designed to implement a back-propagation network
and show how sensitivity analysis works to "explain" the network's
reasoning.
*/
#include <stdio.h> /* file & printer support */
#include <stdlib.h> /* malloc, rand, RAND_MAX & other things */
#include <math.h> /* exp() */
#include <dos.h> /* FP_SEG(), FP_OFF() */
#define MAX_LAYER 4 /* maximum number of layers */
#define MAX_PES 50 /* maximum number of PEs */
/* --- locally required defines --- */
#define WORD(x) (((short *)(&x))[0])
#define MAX(x,y) ((x)>(y)?(x):(y))
typedef float WREAL; /* work real */
/* --- Connection structure --- */
typedef struct _conn { /* connection */
struct _pe *SourceP; /* source pointer */
WREAL WtValR; /* weight value */
WREAL DWtValR; /* delta weight value */
} CONN;
typedef struct _pe { /* processing element */
struct _pe *NextPEP; /* next PE in layer */
WREAL OutputR; /* output of PE */
WREAL ErrorR; /* work area for error */
WREAL WorkR; /* work area for explain */
int PEIndexN; /* PE index (ordinal) */
int MaxConnsN; /* maximum number of connections */
int NConnsN; /* # of connections used */
CONN ConnsS[1]; /* connections to this PE */
} PE;
PE *LayerTP[MAX_LAYER+1] = {0}; /* pointer to PEs in layer */
int LayerNI[MAX_LAYER+1] = {0}; /* # of items in each layer */
PE *PEIndexP[MAX_PES] = {0}; /* index into PEs */
int NextPEXN = {0}; /* index of next free PE */
PE PEBias = { 0, 1.0, 0.0, 0, }; /* "Bias" PE */
/************************************************************************
* *
* RRandR() - compute uniform random number over a range *
* *
************************************************************************
*/
double RRandR( vR ) /* random value over a range */
double vR; /* range magnitude */
{
double rvR; /* return value */
/* compute random value in range 0..1 */
rvR = ((double)rand()) / (double)RAND_MAX;
/* rescale to range -vR..vR */
rvR = vR * (rvR + rvR - 1.0);
return( rvR );
}
/************************************************************************
* *
* AllocPE() - Allocate a PE dynamically *
* *
************************************************************************
*/
PE *AllocPE( peXN, MaxConnsN ) /* allocate A PE dynamically */
int peXN; /* index of PE (0=auto) */
int MaxConnsN; /* max number of connections */
{
PE *peP; /* pointer to PE allocated */
int AlcSize; /* size to allocate */
if ( NextPEXN == 0 ) {
PEIndexP[0] = &PEBias; /* bias PE */
NextPEXN++;
}
if ( peXN == 0 )
peXN = NextPEXN++;
else if ( peXN >= NextPEXN )
NextPEXN = peXN+1;
if ( peXN < 0 || MAX_PES <= peXN ) {
printf( "Illegal PE number to allocate: %d\n", peXN );
exit( 1 );
}
if ( PEIndexP[peXN] != (PE *)0 ) {
printf( "PE number %d is already in use\n", peXN );
exit( 1 );
}
AlcSize = sizeof(PE) + MaxConnsN*sizeof(CONN);
peP = (PE *)malloc( AlcSize );
if ( peP == (PE *)0 ) {
printf( "Could not allocate %d bytes for PE number %d\n",
AlcSize, peXN );
exit( 1 );
}
memset( (char *)peP, 0, AlcSize );
peP->MaxConnsN = MaxConnsN+1; /* max number of connections */
peP->PEIndexN = peXN; /* self index for load/save */
PEIndexP[peXN] = peP; /* key for later */
return( peP );
}
/************************************************************************
* *
* AllocLayer() - Dynamically allocate PEs in a layer *
* *
************************************************************************
*/
int AllocLayer( LayN, NPEsN, NConnPPEN ) /* allocate a layer */
int LayN; /* layer number */
int NPEsN; /* # of PEs in layer */
int NConnPPEN; /* # of connections per PE */
{
PE *peP; /* PE Pointer */
PE *apeP; /* alternate PE pointer */
int wxN; /* general counter */
/* Sanity check */
if ( LayN < 0 || sizeof(LayerTP)/sizeof(LayerTP[0]) <= LayN ) {
printf( "Layer nubmer (%d) is out of range\n", LayN );
exit( 1 );
}
/* Allocate PEs in the layer & link them together */
LayerNI[LayN] = NPEsN;
for( wxN = 0; wxN < NPEsN; wxN++, apeP = peP ) {
peP = AllocPE( 0, NConnPPEN+1 ); /* allocate next PE */
if ( LayerTP[LayN] == (PE *)0 ) {
LayerTP[LayN] = peP; /* insert table pionter */
} else {
apeP->NextPEP = peP; /* link forward */
}
}
return( 0 );
}
/************************************************************************
* *
* FreeNet() - Free network memory *
* *
************************************************************************
*/
int FreeNet()
{
int wxN; /* work index */
char *P; /* work pointer */
for( wxN = 1; wxN < MAX_PES; wxN++ ) {
if ( (P = (char *)PEIndexP[wxN]) != (char *)0 )
free( P );
PEIndexP[wxN] = (PE *)0;
}
NextPEXN = 0;
for( wxN = 0; wxN < MAX_LAYER; wxN++ ) {
LayerTP[wxN] = (PE *)0;
LayerNI[wxN] = 0;
}
return( 0 );
}
/************************************************************************
* *
* SaveNet() - Save Network *
* *
************************************************************************
*/
int SaveNet( fnP ) /* save network */
char *fnP; /* name of file to save */
{
int wxN; /* work index */
FILE *fP; /* file pointer */
PE *peP; /* PE pointer for save */
CONN *cP; /* connection pointer */
int ConnXN; /* connection index */
if ( NextPEXN <= 1 )
return( 0 ); /* nothing to do */
if ( (fP = fopen(fnP, "w")) == (FILE *)0 ) {
printf( "Could not open output file <%s>\n", fnP );
return( -1 );
}
/* --- save all of the PEs --- */
for( wxN = 1; wxN < NextPEXN; wxN++ ) {
peP = PEIndexP[wxN];
if ( peP == (PE *)0 ) {
fprintf( fP, "%d 0 PE\n", wxN );
continue;
}
fprintf( fP, "%d %d PE\n", wxN, peP->NConnsN );
for( ConnXN = 0; ConnXN < peP->NConnsN; ConnXN++ ) {
cP = &peP->ConnsS[ConnXN];
fprintf( fP, "%d %.6f %.6f\n",
cP->SourceP->PEIndexN, cP->WtValR, cP->DWtValR );
}
}
fprintf( fP, "%d %d END OF PES\n", -1, 0 );
/* --- save information about how layers are assembled --- */
for( wxN = 0; wxN < MAX_LAYER; wxN++ ) {
if ( (peP = LayerTP[wxN]) == (PE *)0 )
continue;
fprintf( fP, "%d LAYER\n", wxN );
do {
fprintf( fP, "%d\n", peP->PEIndexN );
peP = peP->NextPEP;
} while( peP != (PE *)0 );
fprintf( fP, "-1 End Layer\n" ); /* end of layer */
}
fprintf( fP, "-1\n" ); /* no more layers */
fclose( fP );
return( 0 );
}
/************************************************************************
* *
* LoadNet() - Load Network *
* *
************************************************************************
*/
int LoadNet( fnP ) /* load a network file */
char *fnP; /* file name pointer */
{
int wxN; /* work index */
FILE *fP; /* file pointer */
PE *peP; /* PE pointer for save */
PE *lpeP; /* last pe in chain */
int LayN; /* layer number */
CONN *cP; /* connection pointer */
int ConnXN; /* connection index */
int PEN; /* PE number */
int PENConnsN; /* # of connections */
int StateN; /* current state 0=PEs, 1=Layers */
float WtR, DWtR; /* weight & delta weight */
char BufC[80]; /* work buffer */
fP = (FILE *)0;
if ( (fP = fopen( fnP, "r" )) == (FILE *)0 ) {
printf( "Could not open output file <%s>\n", fnP );
return( -1 );
}
FreeNet(); /* release any existing network */
StateN = 0;
while( fgets( BufC, sizeof(BufC)-1, fP ) != (char *)0 ) {
switch( StateN ) {
case 0: /* PEs */
sscanf( BufC, "%d %d", &PEN, &PENConnsN );
if ( PEN < 0 ) {
StateN = 2;
break;
}
peP = AllocPE( PEN, PENConnsN ); /* allocate PE */
cP = &peP->ConnsS[0]; /* Pointer to Conns */
ConnXN = PENConnsN;
if ( ConnXN > 0 )
StateN = 1; /* scanning for conns */
break;
case 1: /* PE Connections */
sscanf( BufC, "%d %f %f", &PEN, &WtR, &DWtR );
WORD(cP->SourceP) = PEN;
cP->WtValR = WtR;
cP->DWtValR = DWtR;
cP++; /* next connection area */
peP->NConnsN++; /* count connections */
if ( --ConnXN <= 0 )
StateN = 0; /* back to looking for PEs */
break;
case 2: /* Layer data */
sscanf( BufC, "%d", &LayN );
StateN = 3;
if ( LayN < 0 )
goto Done;
lpeP = (PE *)&LayerTP[LayN];
break;
case 3: /* layer items */
sscanf( BufC, "%d", &PEN );
if ( PEN < 0 ) {
StateN = 2;
break;
}
LayerNI[LayN]++; /* update # of PEs */
peP = PEIndexP[PEN]; /* point to PE */
lpeP->NextPEP = peP; /* forward chain */
lpeP = peP;
break;
}
}
Done:
/* go through ALL PEs and convert PE index to pointers */
for( wxN = 1; wxN < MAX_PES; wxN++ ) {
if ( (peP = PEIndexP[wxN]) == (PE *)0 )
continue;
for( ConnXN = peP->NConnsN, cP = &peP->ConnsS[0];
--ConnXN >= 0;
cP++ ) {
cP->SourceP = PEIndexP[ WORD(cP->SourceP) ];
}
}
if ( fP ) fclose( fP );
return( 0 );
ErrExit:
if ( fP ) fclose( fP );
FreeNet();
return( -1 );
}
/************************************************************************
* *
* PrintNet() - Print out Network *
* *
************************************************************************
*/
int PrintNet( fnP ) /* print out network */
char *fnP; /* file to print to (append) */
{
FILE *fP; /* file pointer */
PE *dpeP; /* destination PE */
int layerXN; /* layer index */
CONN *cP; /* connection pointer */
int ConnXN; /* connection index */
if ( *fnP == '\0' ) {
fP = stdout;
} else {
if ( (fP = fopen( fnP, "a" )) == (FILE *)0 ) {
printf( "Could not open print output file <%s>\n", fnP );
return( -1 );
}
}
for( layerXN = 0; (dpeP = LayerTP[layerXN]) != (PE *)0; layerXN++ ) {
fprintf( fP, "\nLayer %d with %d PEs\n", layerXN, LayerNI[layerXN] );
for(; dpeP != (PE *)0; dpeP = dpeP->NextPEP ) {
fprintf( fP,
" %2d %04x:%04x PE Output=%6.3f Error=%6.3f WorkR=%6.3f NConns=%d\n",
dpeP->PEIndexN, FP_SEG(dpeP), FP_OFF(dpeP),
dpeP->OutputR, dpeP->ErrorR, dpeP->WorkR, dpeP->NConnsN );
for( ConnXN = 0; ConnXN < dpeP->NConnsN; ConnXN++ ) {
cP = &dpeP->ConnsS[ConnXN];
fprintf( fP,
" Src=%2d %04x:%04x Weight=%7.3f Delta Wt=%6.3f\n",
cP->SourceP->PEIndexN,
FP_SEG(cP->SourceP), FP_OFF(cP->SourceP),
cP->WtValR, cP->DWtValR );
}
}
}
if ( fP != stdout )
fclose( fP );
return( 0 );
}
/************************************************************************
* *
* FullyConn() - Fully connect a source layer to a destination *
* *
************************************************************************
*/
int FullyConn( DLayN, SLayN, RangeR )
int DLayN; /* destination layer */
int SLayN; /* source layer */
double RangeR; /* range magnitude */
{
CONN *cP; /* connection pointer */
PE *speP; /* source PE pointer */
PE *dpeP; /* destination PE pointer */
/* loop through each of the PEs in the destination layer */
for( dpeP = LayerTP[DLayN]; dpeP != (PE *)0; dpeP = dpeP->NextPEP ) {
cP = &dpeP->ConnsS[dpeP->NConnsN]; /* start of connections */
if ( dpeP->NConnsN == 0 ) {
/* insert bias PE as first one */
cP->SourceP = &PEBias; /* bias PE */
cP->WtValR = RRandR( RangeR ); /* initial weight */
cP->DWtValR = 0.0;
cP++; /* account for this conn */
dpeP->NConnsN++;
}
/* loop through all PEs in source layer & make connections */
for( speP = LayerTP[SLayN]; speP != (PE *)0; speP = speP->NextPEP ) {
cP->SourceP = speP; /* point to PE */
cP->WtValR = RRandR( RangeR ); /* initial weight */
cP->DWtValR = 0.0;
cP++; /* account for this conn */
dpeP->NConnsN++;
}
}
return( 0 ); /* layers fully connected */
}
/************************************************************************
* *
* BuildNet() - Build all data structures for back-prop network *
* *
************************************************************************
*/
int BuildNet( NInpN, NHid1N, NHid2N, NOutN, ConnPrevF )
int NInpN; /* # of input PEs */
int NHid1N; /* # of hidden 1 PEs (zero if none) */
int NHid2N; /* # of hidden 2 PEs (zero if none) */
int NOutN; /* # of output PEs */
int ConnPrevF; /* 1=connect to all prior layers */
{
int ReqdPEsN; /* # of required PEs */
int LayerXN; /* layer index */
int SLayN, DLayN; /* source / destination layer indicies */
if ( NInpN <= 0 || NOutN <= 0 )
return( -1 ); /* could not build ! */
FreeNet(); /* kill existing net */
ReqdPEsN = NInpN + NHid1N + NHid2N + NOutN;
LayerXN = 0; /* layer index */
AllocLayer( LayerXN, NInpN, 0 ); /* input layer */
if ( NHid1N > 0 ) {
LayerXN++; /* next layer */
AllocLayer( LayerXN, NHid1N, NInpN );
if ( NHid2N > 0 ) {
LayerXN++;
AllocLayer( LayerXN, NHid2N, NHid1N + (ConnPrevF?NInpN:0) );
}
}
LayerXN++;
AllocLayer( LayerXN, NOutN, ConnPrevF?(NInpN+NHid1N+NHid2N):NHid2N );
/* connect up the layers */
for( DLayN = 1; LayerTP[DLayN] != (PE *)0; DLayN++ ) {
for( SLayN = ConnPrevF?0:(DLayN-1); SLayN < DLayN; SLayN++ )
FullyConn( DLayN, SLayN, 0.2 );
}
return( 0 );
}
/************************************************************************
* *
* Recall() - Step network through one recall cycle *
* *
************************************************************************
*/
int Recall( ivRP, ovRP ) /* perform a recall */
float *ivRP; /* input vector */
float *ovRP; /* output vector */
{
int DLayN; /* destination layer index */
PE *peP; /* work PE pointer */
CONN *cP; /* connection pointer */
int ConnC; /* connection counter */
double SumR; /* summation function */
for( DLayN = 0; (peP = LayerTP[DLayN]) != (PE *)0; DLayN++ ) {
for( ; peP != (PE *)0; peP = peP->NextPEP ) {
if ( DLayN == 0 ) {
/* input layer, output is just input vector */
peP->OutputR = ivRP[0]; /* copy input values */
peP->ErrorR = 0.0; /* clear error */
ivRP++;
} else {
/* hidden or output layer, compute weighted sum & transform */
ConnC = peP->NConnsN; /* # of connections */
cP = &peP->ConnsS[0]; /* pointer to connections */
SumR = 0.0; /* no sum yet */
for( ; --ConnC >= 0; cP++ )
SumR += cP->SourceP->OutputR * cP->WtValR;
peP->OutputR = 1.0 / (1.0 + exp(-SumR) );
peP->ErrorR = 0.0;
}
if ( LayerTP[DLayN+1] == (PE *)0 ) {
/* this is output layer, copy result back to user */
ovRP[0] = peP->OutputR; /* copy output value */
ovRP++; /* next output value */
}
}
}
return( 0 );
}
/************************************************************************
* *
* Learn() - step network through one learn cycle *
* *
************************************************************************
return: squared error for this training example
*/
double Learn( ivRP, ovRP, doRP, LearnR, MomR ) /* train network */
float *ivRP; /* input vector */
float *ovRP; /* output vector */
float *doRP; /* desired output vector */
double LearnR; /* learning rate */
double MomR; /* momentum */
{
double ErrorR = 0.0; /* squared error */
double LErrorR; /* local error */
PE *speP; /* source PE */
int DLayN; /* destination layer index */
PE *peP; /* work PE pointer */
CONN *cP; /* connection pointer */
int ConnC; /* connection counter */
Recall( ivRP, ovRP ); /* perform recall */
/* search for output layer */
for( DLayN = 0; (LayerTP[DLayN+1]) != (PE *)0; )
DLayN++;
/* compute error, backpropagate error, update weights */
for( ; DLayN > 0; DLayN-- ) {
for( peP = LayerTP[DLayN]; peP != (PE *)0; peP = peP->NextPEP ) {
if ( LayerTP[DLayN+1] == (PE *)0 ) {
/* output layer, compute error specially */
peP->ErrorR = (doRP[0] - peP->OutputR);
ErrorR += (peP->ErrorR * peP->ErrorR);
doRP++;
}
/* pass error back through transfer function */
peP->ErrorR *= peP->OutputR * (1.0 - peP->OutputR);
/* back-propagate it through connections & update them */
ConnC = peP->NConnsN; /* # of connections */
cP = &peP->ConnsS[0]; /* pointer to connections */
LErrorR = peP->ErrorR; /* local error */
for( ; --ConnC >= 0; cP++ ) {
speP = cP->SourceP;
speP->ErrorR += LErrorR * cP->WtValR; /* propagate error */
cP->DWtValR = /* compute new weight */
LearnR * LErrorR * speP->OutputR +
MomR * cP->DWtValR;
cP->WtValR += cP->DWtValR; /* update weight */
}
}
}
return( ErrorR );
}
/************************************************************************
* *
* Explain() - compute the derivative of the output for changes *
* in the inputs *
* *
************************************************************************
Basic Procedure:
1) do a recall to find out what the nominal output values are
2) copy the nominal values to "WorkR" in the PE structure.
(We could have used the ErrorR field but WorkR was
used to reduce confusion.)
3) for each input:
a) Add a small amount to the input value (DitherR)
b) do a Recall & compute derivative of output
c) subtract samll amount from nominal in put value
d) do a Recall & compute derivative of outputs
e) Average two derivatives
*/
int Explain( ivRP, ovRP, evRP, DitherR )
float *ivRP; /* input vector */
float *ovRP; /* output result vector */
float *evRP; /* explain vector */
double DitherR; /* dither */
{
PE *speP; /* source PE (input) */
int speXN; /* source PE index */
PE *dpeP; /* destination PE (output) */
int dpeXN; /* destination PE index */
int OutLXN; /* output layer index */
/* figure out index of output layer */
for( OutLXN = 0; LayerTP[OutLXN+1] != (PE *)0; )
OutLXN++;
Recall( ivRP, ovRP ); /* set up initial recall */
/* go through output layer and copy output to "WorkR" */
for( dpeP = LayerTP[OutLXN]; dpeP != (PE *)0; dpeP = dpeP->NextPEP )
dpeP->WorkR = dpeP->OutputR;
/* for each input, compute its effects on the output */
for( speXN = 0, speP = LayerTP[0];
speP != (PE *)0;
speXN++, speP = speP->NextPEP ) {
/* dither in positive direction */
ivRP[speXN] += DitherR; /* add dither */
Recall( ivRP, ovRP ); /* new output */
/* set initial results to evRP */
for( dpeXN = 0, dpeP = LayerTP[OutLXN];
dpeP != (PE *)0;
dpeXN++, dpeP = dpeP->NextPEP )
evRP[dpeXN] = 0.5 * (dpeP->OutputR - dpeP->WorkR) / DitherR;
/* dither in negative direction */
ivRP[speXN] -= (DitherR + DitherR); /* subtract dither */
Recall( ivRP, ovRP ); /* new output */
/* set final results to evRP */
for( dpeXN = 0, dpeP = LayerTP[OutLXN];
dpeP != (PE *)0;
dpeXN++, dpeP = dpeP->NextPEP )
evRP[dpeXN] -= 0.5 * (dpeP->OutputR - dpeP->WorkR) / DitherR;
/* point to next row of explain vector */
evRP += dpeXN;
/* restore current input to original value */
ivRP[speXN] += DitherR;
}
return( 0 );
}
/************************************************************************
* *
* Network Training Data *
* *
************************************************************************
+--------------+-----------+ 1.0
| | |
2 | zero | |
| | one |
t +--------------+ | 0.7
u | |
p | |
n +--------------------------+ 0.3
I | |
| zero |
| |
+--------------------------+ 0.0
0.0 0.5 1.0
Input 1
Input 3 is "noise".
Output 1 is shown above.
Output 2 is opposite of output 1.
*/
typedef struct _example { /* example */
float InVecR[3]; /* input vector */
float DoVecR[2]; /* desired output vector */
} EXAMPLE;
#define NTEST (sizeof(testE)/sizeof(testE[0])) /* # of test items */
EXAMPLE testE[] = {
/* --- Inputs --- --- Desired Outputs -- */
{ 0.0, 0.0, 0.0, 0.0, 1.0 },
{ 0.0, 0.2, 0.6, 0.0, 1.0 },
{ 0.0, 0.4, 0.1, 1.0, 0.0 },
{ 0.0, 0.6, 0.7, 1.0, 0.0 },
{ 0.0, 0.8, 0.2, 0.0, 1.0 },
{ 0.0, 1.0, 0.8, 0.0, 1.0 },
{ 0.2, 0.0, 0.9, 0.0, 1.0 },
{ 0.2, 0.2, 0.3, 0.0, 1.0 },
{ 0.2, 0.4, 0.8, 1.0, 0.0 },
{ 0.2, 0.6, 0.2, 1.0, 0.0 },
{ 0.2, 0.8, 0.7, 0.0, 1.0 },
{ 0.2, 1.0, 0.1, 0.0, 1.0 },
{ 0.4, 0.0, 0.0, 0.0, 1.0 },
{ 0.4, 0.2, 0.6, 0.0, 1.0 },
{ 0.4, 0.4, 0.1, 1.0, 0.0 },
{ 0.4, 0.6, 0.7, 1.0, 0.0 },
{ 0.4, 0.8, 0.2, 0.0, 1.0 },
{ 0.4, 1.0, 0.8, 0.0, 1.0 },
{ 0.6, 0.0, 0.9, 0.0, 1.0 },
{ 0.6, 0.2, 0.3, 0.0, 1.0 },
{ 0.6, 0.4, 0.8, 1.0, 0.0 },
{ 0.6, 0.6, 0.2, 1.0, 0.0 },
{ 0.6, 0.8, 0.7, 1.0, 0.0 },
{ 0.6, 1.0, 0.1, 1.0, 0.0 },
{ 0.8, 0.0, 0.4, 0.0, 1.0 },
{ 0.8, 0.2, 0.6, 0.0, 1.0 },
{ 0.8, 0.4, 0.1, 1.0, 0.0 },
{ 0.8, 0.6, 0.7, 1.0, 0.0 },
{ 0.8, 0.8, 0.2, 1.0, 0.0 },
{ 0.8, 1.0, 0.8, 1.0, 0.0 },
{ 1.0, 0.0, 1.0, 0.0, 1.0 },
{ 1.0, 0.2, 0.3, 0.0, 1.0 },
{ 1.0, 0.4, 0.8, 1.0, 0.0 },
{ 1.0, 0.6, 0.2, 1.0, 0.0 },
{ 1.0, 0.8, 0.7, 1.0, 0.0 },
{ 1.0, 1.0, 0.0, 1.0, 0.0 }
};
int TestShuffleN[ NTEST ] = {0}; /* shuffle array */
int TestSXN = {NTEST+1}; /* current shuffle index */
/************************************************************************
* *
* NextTestN() - Do shuffle & deal randomization of training set *
* *
************************************************************************
*/
int NextTestN() /* Get next Training example index */
{
int HitsN; /* # of items we have added to list */
int wxN; /* work index into shuffle array */
int xN,yN; /* indicies of items to swap */
if ( TestSXN >= NTEST ) {
/* reshuffle the array */
for( wxN = 0; wxN < NTEST; wxN++ )
TestShuffleN[wxN] = wxN;
/* quick & dirty way to shuffle. Much better ways exist. */
for( HitsN = 0; HitsN < NTEST+NTEST/2; HitsN++ ) {
xN = rand() % NTEST;
yN = rand() % NTEST;
wxN = TestShuffleN[xN];
TestShuffleN[xN] = TestShuffleN[yN];
TestShuffleN[yN] = wxN;
}
TestSXN = 0;
}
return( TestShuffleN[TestSXN++] );
}
/************************************************************************
* *
* TrainNet() - Driver for training Network *
* *
************************************************************************
*/
int TrainNet( ErrLvlR, MaxPassN ) /* train network */
double ErrLvlR; /* error level to achieve */
long MaxPassN; /* max number of passes */
{
float rvR[MAX_PES]; /* result vector */
double lsErrR;
int CurTestN; /* current test number */
int HitsN; /* # of times below threshold */
int PassN; /* pass through the data */
int ExampleN; /* example number */
HitsN = 0;
CurTestN = 0;
lsErrR = 0.0;
PassN = 0;
for(;;) {
ExampleN = NextTestN(); /* next test number */
lsErrR += Learn( &testE[ExampleN].InVecR[0],
&rvR[0],
&testE[ExampleN].DoVecR[0], 0.9, 0.5 );
CurTestN++;
if ( CurTestN >= NTEST ) {
PassN++;
lsErrR = sqrt(lsErrR)/ (double)NTEST;
if ( lsErrR < ErrLvlR )
HitsN++;
else HitsN = 0;
printf( "Pass %3d Error = %.3f Hits = %d\n",
PassN, lsErrR, HitsN );
if ( PassN > MaxPassN || HitsN > 3 ) /* exit criterial */
break;
CurTestN = 0;
lsErrR = 0.0;
}
}
/* done training, start testing */
return( 0 );
}
/************************************************************************
* *
* ExplainNet() - do explain & print it out *
* *
************************************************************************
*/
int ExplainNet( fnP, DitherR ) /* explain & print */
char *fnP; /* output file name */
double DitherR; /* amount to dither */
{
FILE *fP; /* file pointer */
int wxN; /* work index */
int xN, yN; /* x,y values */
int axN; /* alternate work index */
float *wfP; /* work float pointer */
static float ivR[MAX_PES] = {0}; /* input vector */
static float ovR[MAX_PES] = {0}; /* work area for output data */
static float evR[MAX_PES*MAX_PES] = {0}; /* explain vector */
/* evR[0] = dY1 vs Input 1
* evR[1] = dY2 vs Input 1
* evR[2] = dY1 vs Input 2
* evR[3] = dY2 vs Input 2
* evR[4] = dY1 vs Input 3
* evR[5] = dY2 vs Input 3
*/
if ( *fnP == '\0' ) {
fP = stdout;
} else {
if ( (fP = fopen( fnP, "a" )) == (FILE *)0 ) {
printf( "Could not open explain output file <%s>\n", fnP );
return( -1 );
}
}
fprintf( fP,
"\f\n*** Network Output as a function of inputs 1 & 2 ***\n\n" );
ivR[2] = 0.5;
for( yN = 20; yN >= 0; yN-- ) {
if ( (yN % 2) == 0 ) fprintf( fP, "%6.2f | ", yN/20. );
else fprintf( fP, " | " );
for( xN = 0; xN <= 20; xN++ ) {
ivR[0] = xN / 20.;
ivR[1] = yN / 20.;
Recall( &ivR[0], &ovR[0] );
/* --- ignore very small changes --- */
if ( fabs(ovR[0]) < .1 ) fprintf( fP, " - " );
else fprintf( fP, "%5.1f", ovR[0] );
}
fprintf( fP, "\n" );
}
fprintf( fP, " +-" );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, "-----" );
fprintf( fP, "\n " );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, (xN % 2)==0?"%5.1f":" ", xN/20. );
fprintf( fP, "\n" );
fprintf( fP,
"\f\n*** Plot of Explain Function for Input 1 over input range ***\n\n" );
ivR[2] = 0.5;
for( yN = 20; yN >= 0; yN-- ) {
if ( (yN % 2) == 0 ) fprintf( fP, "%6.2f | ", yN/20. );
else fprintf( fP, " | " );
for( xN = 0; xN <= 20; xN++ ) {
ivR[0] = xN / 20.;
ivR[1] = yN / 20.;
Explain( &ivR[0], &ovR[0], &evR[0], DitherR );
/* --- ignore very small changes --- */
if ( fabs(evR[0]) < .1 ) fprintf( fP, " - " );
else fprintf( fP, "%5.1f", evR[0] );
}
fprintf( fP, "\n" );
}
fprintf( fP, " +-" );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, "-----" );
fprintf( fP, "\n " );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, (xN % 2)==0?"%5.1f":" ", xN/20. );
fprintf( fP, "\n" );
fprintf( fP,
"\f\n*** Plot of Explain Function for Input 2 over input range ***\n\n" );
ivR[2] = 0.5;
for( yN = 20; yN >= 0; yN-- ) {
if ( (yN % 2) == 0 ) fprintf( fP, "%6.2f | ", yN/20. );
else fprintf( fP, " | " );
for( xN = 0; xN <= 20; xN++ ) {
ivR[0] = xN / 20.;
ivR[1] = yN / 20.;
Explain( &ivR[0], &ovR[0], &evR[0], DitherR );
/* --- ignore very small changes --- */
if ( fabs(evR[2]) < .1 ) fprintf( fP, " - " );
else fprintf( fP, "%5.1f", evR[2] );
}
fprintf( fP, "\n" );
}
fprintf( fP, " +-" );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, "-----" );
fprintf( fP, "\n " );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, (xN % 2)==0?"%5.1f":" ", xN/20. );
fprintf( fP, "\n" );
fprintf( fP,
"\f\n*** Plot of Explain Function for Input 3 over input range ***\n\n" );
ivR[2] = 0.5;
for( yN = 20; yN >= 0; yN-- ) {
if ( (yN % 2) == 0 ) fprintf( fP, "%6.2f | ", yN/20. );
else fprintf( fP, " | " );
for( xN = 0; xN <= 20; xN++ ) {
ivR[0] = xN / 20.;
ivR[1] = yN / 20.;
Explain( &ivR[0], &ovR[0], &evR[0], DitherR );
/* --- ignore very small changes --- */
if ( fabs(evR[4]) < .1 ) fprintf( fP, " - " );
else fprintf( fP, "%5.1f", evR[4] );
}
fprintf( fP, "\n" );
}
fprintf( fP, " +-" );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, "-----" );
fprintf( fP, "\n " );
for( xN = 0; xN <= 20; xN++ )
fprintf( fP, (xN % 2)==0?"%5.1f":" ", xN/20. );
fprintf( fP, "\n" );
if ( fP != stdout )
fclose( fP );
return( 0 );
}
/************************************************************************
* *
* main() - Driver for entre program *
* *
************************************************************************
*/
main()
{
int ActionN; /* action character */
char *sP; /* string pointer */
char *aP; /* alternate pointer */
char BufC[80]; /* work buffer */
printf( "\nC-Program to Explain a Neural Network's Conclusions\n" );
printf( " Written by: Casimir C. 'Casey' Klimasauskas, 04-Jan-91\n" );
for(;;) {
printf( "\
C - create a new network\n\
L [fname] - load a trained network\n\
S [fname] - save a network\n\
P [fname] - print out network\n\
F - free network from memory\n\
T - Train network\n\
E [fname] - Explain network\n\
X - eXit from the program\n\
What do you want to do? " );
fflush( stdout );
sP = fgets( BufC, sizeof(BufC)-1, stdin );
if ( sP == (char *)0 )
break;
while( *sP != 0 && *sP <= ' ' )
sP++;
ActionN = *sP;
if ( 'A' <= ActionN && ActionN <= 'Z' )
ActionN -= 'A'-'a'; /* convert to LC */
sP++;
while( *sP != 0 && *sP <= ' ' )
sP++; /* skip to argument */
for( aP = sP; *aP > ' '; )
aP++; /* skip to end of argument */
*aP = '\0'; /* null terminate it */
switch( ActionN ) {
case 'c': /* create network */
BuildNet( 3, 5, 0, 2, 1 );
break;
case 'l': /* load network */
if ( *sP == '\0' )
sP = "network.net";
LoadNet( sP );
break;
case 's': /* save network */
if ( *sP == '\0' )
sP = "network.net";
SaveNet( sP );
break;
case 'p': /* print network */
PrintNet( sP );
break;
case 'f': /* free network */
FreeNet();
break;
case 't': /* train network */
TrainNet( 0.001, 100000L );
break;
case 'e': /* explain network */
ExplainNet( sP, .01 );
break;
case 'x': /* done */
goto Done;
default:
break;
}
}
Done:
return( 0 );
}
Copyright © 1991, Dr. Dobb's Journal