Gary has a BS in physics from the University of California at Irvine and a PhD from the University of Hawaii. He is currently doing research work at the Fermi National Accelerator Laboratory and can be reached at gary@master.ps.uci.edu.
Many quantities are natural variables to examine statistically but are useless as test statistics because their error isn't easily estimated. Still, many possible test statistics are easily computed. For example, computing the median value of a data set requires little more than sorting the data and finding the middle value. The second moment of a distribution is calculated by simply summing the square of the difference between each value and the mean value. These types of quantities are frequently used as some sort of decision criteria, but a single value itself does not allow rejection of a hypothesis. Instead, hypotheses are rejected based on confidence intervals; however, calculating the confidence interval for the median or moments is extremely difficult using conventional techniques. What we need is a calculable and accurate estimation of probability distributions for complex quantities.
Conventional error estimation requires assumptions about the shape of the distributions involved. Much effort has been dedicated to formulating more precise approximation methods for error estimation and propagation, but they inevitably rely on many assumptions. For example, if C=A+B and A and B are sets of measured data, the standard method for estimating the value and error of C is to calculate the mean values for A and B and use those values to calculate the mean value for C. Then, the error on
Systematic errors resulting from bad assumptions are propagated throughout the calculation, leaving an ambiguous result. Accurate error propagation using conventional methods is a delicate process, sensitive to assumptions. If the true distribution is not approximated well by a Gaussian, but is instead skewed with a long tail, treating that distribution as a Gaussian induces a large systematic uncertainty in the final result. Many analyses are proven faulty due to bad assumptions and thereby cast doubt on all tenuous results. The significance required for a formidable result has increased throughout the years to a level that precludes many experiments.
Proper signal analysis requires a technique which minimizes assumptions. If the final result is calculated without assuming distribution functions or using crude approximation methods for propagating the error through an equation, the calculated significance is reliable. Such a method allows the analysis of tenuous signals because a borderline result lacks calculational uncertainties. It exactly states the probability of the result occurring randomly without many assumptions convolved into the calculation.
When Efron extended the Jackknife technique to form the Bootstrap technique (see "References"), he provided a method to easily propagate errors through complex calculations, resulting in an estimate of the error with few assumptions. The Bootstrap is a resampling algorithm that estimates the error on quantities by resampling the data in random ways. The astrophysics community adapted this technique to search for tenuous signals in multidimensional data, in which the mechanisms and local efficiencies are not well understood. Generating a background via the Bootstrap allows a sensitive analysis without assumptions convolved into the calculation. The measured data is assumed to be a discrete sampling of the background distribution and can thus be used to calculate the background. By using the data itself to calculate the background, local efficiencies and systematic uncertainties are directly incorporated into the calculation. This technique provides a way to analyze signals without assuming efficiency functions or probability distributions. When it's crucial to detect signals without many assumptions convolved into the calculation, the Bootstrap provides a sensitive technique.
Signals are often regarded as a correlation between measured quantities; however, the most basic signal emerges as a measured value that differs from our expectation. The Bootstrap provides signal detection by generating the probability distribution for the measured value. Put the data through the same process many times, but sample the quantities with replacements from the data each time. This process will generate many estimates, which are binned to make a distribution. Use that distribution to test the significance of the result. Instead of knowing perfectly the components needed to calculate the background, the data itself is used to generate the background and thereby accounts for local biases.
The Bootstrap begins by sampling values randomly with replacements from the data to form a fake set. Suppose, for example, that the data is 2,5,7. You select three random numbers between 1 and 3, say, 1,3,1. Then you choose the data values at the location pointed to by the random numbers to form the fake set 2,7,2. Use this data to calculate the test variable. Each time the procedure is repeated, a different background estimate results. As the different values are found, bin them into a distribution and keep running totals for mean and variance calculations.
The Bootstrap recognizes that the data is a discrete sampling of the background distribution; thus, the background distribution can be built from the data. Binning these background estimates forms the entire probability distribution for the measured variable. This distribution tests the statistical significance of the measured value. Probability theory states that integrating the probability distribution from a value to [infinity] measures the probability of randomly obtaining that value. So, with the entire distribution in hand, find the significance by integrating the distribution out to the expected value.
If the distribution is approximately Gaussian, it may be more convenient to express the results using the mean and standard deviation rather than the entire distribution. It is therefore recommended to keep the running totals necessary for mean and variance calculations. Just remember, expressing the results in this way adds another assumption into the process. It's preferable to express the random-occurrence probability, since the mean and standard deviation suppose a Gaussian distribution.
When doing conventional theoretical background calculations, the distributions of the input quantities are previously measured or calculated theoretically, and those distributions are used to obtain discrete values for the calculation. Furthermore, some sort of noise is added to make the calculation more realistic. Using the measured quantities in hand that represent a sampling of that distribution with the correct noise and errors included is more economical than performing conventional background calculation.
As with any Monte Carlo calculation, the Bootstrap technique relies heavily on a pseudorandom-number generator. Since this calculation supposedly removes correlation, it's essential that no correlation exist in the random numbers out of the generator. To achieve this, you must use a good random-number generator, which eliminates most system-supplied generators. These generators tend to have a slightly nonuniform distribution and often have bad time-occurrence correlations. This is undesirable for small signals, but tolerable for everyday calculations. In this article, I use ran1 from Numerical Recipes, which does well, but is much slower than most system routines. Whichever generator you use, it must be thoroughly tested for both uniform distribution and uncorrelated time occurrence. Knuth's spectral test is great for detecting correlated occurrence.
The error on the mean background estimate converges slowly, necessitating many repetitions of this procedure to get a precise estimate. Statistics show that the error of the mean of a sampled distribution is
With the probability distribution in discrete form, the confidence intervals are easily calculated. Instead of integrating the binned distribution, do the integral as the estimates are generated. Counting the number of background estimates above and below the test value, P=Nabove/(Nbelow+Nabove), immediately gives the probability for the value to occur randomly greater than the mean background estimate. If the estimates are binned, the probability is also available from the histogram, but the square shape of the bins causes a slight loss of precision. That error could be reduced with other methods, like trapezoid integration, but the counting method is clearly the best.
The first example demonstrates the original Bootstrap technique by calculating the median and its error for a measured set of grades for a hypothetical university course called Intro to Computer Science 101. If an administrator believes a well-run class results in a median grade of 50 percent, determining if ICS 101 is taught well is very difficult using conventional techniques--unless the median is exactly 50 percent. Since there's only one sample, the error on the median grade must be determined to see if the median grade is statistically different from 50 percent. Since a median higher or lower than 50 percent is tested for, a two-tail confidence interval is appropriate. The grades for ICS 101 are as follows: 45, 67, 73, 27, 44, 96, 49, 38, 51, 63, 11, 87, 67, 46, 76, 57, 53, 61, 57, 64. Figure 1 shows the distribution of grades. The median grade is 57. The program in Listing One (page 81) calculates the median grade distribution (also shown in Figure 1) using the Bootstrap technique. The median grade for ICS 101 had a 13 percent probability for occurring randomly, well within the 5 percent confidence limits chosen by the administrator.
Signals in multidimensional data appear as a significantly large measurement correlated with particular values for other measured quantities. To detect the significance or strength of the signal, it must be compared to the background. Conventional testing schemes assume that the background follows some fit function, and the errors are given by an assumed probability distribution. Otherwise, they calculate a theoretical background and assume the error on the measured values. These techniques are appropriate for large signal-to-noise ratios, but untrustworthy otherwise. When searching for small signals in a noisy background, extending the Bootstrap to correlated signals provides a new method that relies on fewer assumptions to calculate the background distribution.
Signals in multidimensional data are seen by a correlation in measurements. The most prevalent examples are signals seen by an increase in one variable when another variable is changed to a particular value. For example, if a directional microphone were turned through 360 degrees and there happened to be a loud car at 37 degrees, the decibel meter would peak at 37 degrees. Thus, there's an interdependence between the angle and decibel measurements.
Conventional techniques fit the background to a function with the supposed signal removed. Unfortunately, the shape of the fit function often correlates strongly with the chosen points of where the signal begins and ends. Assume a distribution with a "reasonable" width for the error on the background map points. If the data is counting data, the map points have a Poisson distribution. In any event, the end result is untrustworthy for small signals.
Some experiments attempt to calculate the background theoretically with some sort of noise added to make the calculation more realistic. This technique is usually the least sensitive possible, since few of the fundamental processes are understood well enough to yield a calculation with small systematic uncertainty.
Searching for small correlation signals in noisy data and systematic biases lends itself to a variation in the Bootstrap technique; the result is a new, powerful method of signal analysis. Instead of sampling the distribution with replacements, you randomly shuffle one quantity with respect to the others to eliminate the correlation. As before, repeat this process many times to build up an estimate of the background distribution at each point on the map. The distributions have the signal included in them and thus give a conservative estimate. This process is also better for large data sets than tradition sampling because the events are shuffled in the same memory location rather than allocating new memory for the sampled events.
Remove the point-by-point correlation by considering the two sets of measured quantities separate and independent. Because they're independent, the values can be shuffled into any random order. This technique essentially averages over one variable while including the biases in the other variables. To sample uncorrelated events, it's efficient to shuffle the values with respect to each other using a technique developed by Knuth. After shuffling, read the arrays of data sequentially for each point. Reading the entire shuffled data set and making a background map provides one background estimate.
As the median example showed, repeating the Bootstrap process many times yields many discrete samplings of the background distribution. Bin these samples to see the distribution or use them to directly determine the significance. Using these samples to compute the significance of the measured values is easy. Count the number of estimates above and below the test value for the probability, or calculate the mean and standard deviation and assume a Gaussian distribution.
Having no systematic selection of the values used in this procedure includes the signal into the background estimate. One benefit of using the data with the signal included is that this generates a conservative background estimate. The common criticism of background subtraction results is that slight variations in the assumed background shape can wildly vary the results, which indicates that the results are not trustworthy. The Bootstrap calculates the background with the signal values included, generating a conservative, stable estimate. As previously demonstrated, this estimate is also free of the many assumptions normally required. Altogether, this technique yields a sensitive and trustworthy result.
The next example uses the Bootstrap to see a small signal from a star in a telescope. A simple telescope that measures nothing more than total light level sweeps past a star to detect whether or not it can see the light from the star; see Figure 2(a). The two measured values are the angle of the telescope (
The program in Listing Two (page 81) calculates the background distributions by randomizing the angle with respect to the light level. Doing so essentially averages the light levels over the different angles. Figure 3 shows the results. Notice that the data shows a measurement above the background at the angle of the star. The program indicates the random chance probability at
Perhaps the greatest benefit from this type of analysis is the ease with which systematic biases are handled. Since the data includes biases, using the data to calculate the background includes the systematic biases, provided you choose to randomize about the correct variables. If there were a strong bias against a combination of two angles, it would not be preferable to randomize out that correlation. Instead, randomizing the set of those angles with respect to another parameter provides the needed calculation while including the bias in angles. To illustrate this point, the next example adds a bias into the data.
Now let's extend the previous discussion to account for the earth's rotation. Table 2 shows the angle of the earth
Figure 5 shows the results of the analysis. The histogram represents the measured values, the plotted points show the mean values of the background distribution, and the error bars represent the standard deviation of the background distribution. The program also generates the random-chance probabilities for the measured variables. The star is seen at
Although the last example performed well, this technique is actually best for counting data. If the data in Table 2 is converted to counting data, both biases are handled optimally. For instance, treat the first light level of 23 to actually mean there are 23 photons collected; replace that one point with 23 values of
The results from this example handle the biases perfectly. Figure 6 shows the results. The random-chance probability for the observation is extremely small, P(
The Bootstrap is the best possible choice for many signal-analysis situations. It is one of few ways to propagate errors through complex transformations with little difficulty, and it requires few assumptions. Its ease of use, accuracy, excellent handling of local biases, and O(n) run time make it an essential tool for contemporary signal analysis.
Efron, B. Annals of Statistics, 7, 1979.
----. The Jackknife, the Bootstrap and Other Resampling Plans, CBMS 38, 1982.
Flannery, B.P., et al. Numerical Recipes. Cambridge: Cambridge University Press, 1986.
Knuth, D. The Art of Computer Programming. Reading, MA: Addison-Wesley, 1973.
Copyright © 1994, Dr. Dobb's Journal
C is found with the formula in Example 1, where
2AB is the covariance between the two quantities and
2A and
2B are the variances. We use A and B to calculate the confidence interval for C, and the distribution of C is assumed to be Gaussian. These formulas are valid only under certain assumptions about the distributions of A and B. Furthermore, calculating the confidence interval with standard techniques asserts the probability distribution for C and uses the calculated mean and standard deviation.The Bootstrap
Implementation
=s.d. / [sqrt]N, where N is the number of points in the sample. Therefore, the calculated mean approaches the "true" mean as 1 / [sqrt]N. For a data sample of 100 numbers, the Bootstrap yields an estimate of approximately 10 percent. For an accurate value, repeat the Bootstrap procedure many times. There is one exception to this rule: The fewer the data points, the smaller the total number of permutations. Generating many more estimates than there are permutations is wasteful. Repeating too many times calculates the same numbers repetitively, doing little to enhance the accuracy of the calculation.Correlation Signals
), measured from some reference point, and the light level in the telescope. The signal is detected by a correlation between the light level and the angle at which the star is located (
=0). A turbulent atmosphere further complicates the measurement by randomly fluctuating the light level. Table 1 shows the measured data.
=0 is 8 percent; thus, the star seems to be detected by this telescope with 92 percent confidence. The signal might appear stronger than estimated to the human eye, but that illustrates the problem with fitting functions to make a background. This result shows that there is a signal, given no assumptions. Although this example is useful to demonstrate the procedure, it trivializes the advantages of this technique.
, the angle of the telescope
, and the light level measured by the telescope shown in Figure 2(b). In this example, there is a large bias to the measurement. When the data was taken, there happened to be a tree branch overhead shadowing telescope angles between -5 and +5 degrees. The experimenter looks at the distributions shown in Figure 4 and sees there is a bias, which leads him or her to randomize the earth's angle. Randomizing the earth's angle separately from the telescope angle and light level grouped together leaves the strong telescope-angle bias in the calculation. The program in Listing Three (page 81) calculates the background in this way.
=30 degrees with a 95 percent confidence.
=0 and
= -10. After doing this for all the points, repeat the process with only two changes. First, instead of incrementing the background map by the light level associated with the
value, increment it by one for each
,
pair. Second, calculate the mean
as done previously, but set the
=[sqrt]
since this is counting data. Counting data is distributed in a Poisson distribution, which can be approximated by a Gaussian with a variance equal to the mean. Listing Four (page 82) converts the data from the previous example into counting data and then performs the suggested algorithms.
=30_) ~0. Once again, notice that the Bootstrap slightly overestimates the background because the signal is included. This final result does not assume that any local-angle distributions are uniform. Instead, it entirely accounts for the distributions with only the angle-to-angle interdependence removed. By doing so, it provides a formidable result relying on few assumptions.Conclusions
References
Example 1: Error on
C is found with this formula. Figure 1: Grade distribution (wide histogram) and Bootstrap-generated median-probability distribution (small histogram).
Figure 2: (a) The measured values are the angle of the telescope
, measured from some reference point, and the light level in the telescope. The signal is detected by a correlation between the light level and the angle at which the star is located (
=0); (b) large bias to the measurement. Figure 3: Observed (histogram) and predicted (points) light levels vs.
. The source is at
=0 degrees. Figure 4: (a) Light level vs. telescope angle; (b) light level vs. earth angle.
Figure 5: Observed (histogram) and predicted (points) light levels vs.
. The source is at
=30 degrees. Figure 6: Observed (histogram) and predicted (points) light levels vs.
. The source is at
=30 degrees.Table 1: Data from a telescope that measures total light level.
Theta Light Level
-------------------
-5 1.1
-4 0.3
-3 1.9
-2 1.6
-1 2.3
0 3.3
1 1.8
2 1.0
3 1.7
4 1.6
5 0.8Table 2: The angle of the earth (_), the angle of the telescope (q), and the light level measured by the telescope shown in Figure 2(b).
Phi Theta Light Level
---------------------------
0 -10 23
0 -5 11
0 0 10
0 5 10
0 10 20
10 -10 19
10 -5 9
10 0 13
10 5 10
10 10 24
20 -10 20
20 -5 11
20 0 11
20 5 13
20 10 35
30 -10 22
30 -5 13
30 0 17
30 5 14
30 10 25
40 -10 34
40 -5 13
40 0 9
40 5 10
40 10 19
50 -10 22
50 -5 10
50 0 8
50 5 11
50 10 25
[LISTING ONE]
#include<stdio.h>
#include<math.h>
#include<stdlib.h>
#define NumOfGrades 20
#define NumOfBins 20
#define TestMedian 50
#define TGA {45,67,73,27,44,96,49,38,51,63,11,87,67,46,76,57,53,61,57,64}
#define nBSLoops 1000000
main()
{
int i, j, index, numOver=0;
int grades[NumOfGrades]=TGA, fake[NumOfGrades], iseed = -534;
float sum=0, sumSq=0, median, fmedian, ave, sdev, prob1Tail, prob2Tail;
float hist[NumOfBins];
extern int cmp();
extern float ran1(), FindMedian();
/*Make sure histogram starts at 0*/
for( i=0; i<NumOfBins; i++ )
hist[i]=0.0;
/*First, find the true median*/
median = FindMedian( grades );
printf( "median=%f\n", median );
for( i=0; i<nBSLoops; i++ )
{
/*Make a fake array by sampling the grades with replacement*/
for( j=0; j<NumOfGrades; j++ )
{
index = (int)( (float)NumOfGrades * ran1( &iseed ) );
fake[j] = grades[ index ];
}
/*Find the median for that fake set of grades*/
fmedian = FindMedian( fake );
/*Find the components needed later for mean and probability*/
sum += fmedian;
sumSq += fmedian*fmedian;
if( fmedian >= TestMedian )
numOver++;
/*Build histogram*/
index = (int)( (fmedian - 30.0)/2.0 );
if( (index >= 0) && (index < NumOfBins) )
hist[index]++;
}
/*Output the histogram for printing purposes*/
printf( "median probability distribution\n" );
for( i=0; i<NumOfBins; i++ )
printf( "%d %f\n", (2*i+31), ((float)hist[i]/(float)nBSLoops));
/*The probability of random occurance*/
prob1Tail = 1.0 - (float)(numOver)/(float)(nBSLoops);
prob2Tail = 2.0*prob1Tail;
printf( "1-tail-prob=%f 2-tail-prob=%f\n", prob1Tail, prob2Tail );
/*Find the mean and stdev*/
ave = sum/nBSLoops;
sdev = sqrt( (sumSq-ave*ave*nBSLoops)/( (float)nBSLoops-1.0 ) );
printf( "average=%f sdev=%f\n", ave, sdev );
exit(0);
}
float FindMedian( int array[NumOfGrades] )
{
float median;
extern int cmp();
qsort( array, NumOfGrades, sizeof(int), cmp );
median = ( (float)(array[9]+array[10]) / 2.0 );
return( median );
}
int cmp( int *in1, int *in2 )
{
int flag;
flag = *in1 - *in2;
return( flag );
}
[LISTING TWO]
#include<stdio.h>
#include<math.h>
#define NumOfAngles 11
#define TMA { 1.1, 0.3, 1.9, 1.6, 2.3, 3.3, 1.8, 1.0, 1.7, 1.6, 0.8 }
#define TAA { -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5 }
#define nBSLoops 1000000
main()
{
int i, j, index, numOver[NumOfAngles];
int iseed = -534;
int angle[NumOfAngles]=TAA;
float measure[NumOfAngles]=TMA, fmeasure[NumOfAngles]=TMA;
float sum[NumOfAngles], sumSq[NumOfAngles], mean[NumOfAngles], temp;
float sdev[NumOfAngles], prob1Tail[NumOfAngles], fm, tempMeasure;
extern float ran1();
for( i=0; i<NumOfAngles; i++ )
{
sum[i] = 0;
sumSq[i] = 0;
numOver[i] = 0;
}
for( i=0; i<nBSLoops; i++ )
{
/*Make a fake array by performing Knuth shuffling*/
j = NumOfAngles;
do
{
index = (int)( (float)j * ran1( &iseed ) );
tempMeasure = fmeasure[ index ];
fmeasure[ index ] = fmeasure[ j ];
fmeasure[ j ] = tempMeasure;
j--;
}
while( j > 0 );
/*Find the components needed later for mean and probability*/
for( j=0; j<NumOfAngles; j++ )
{
fm = fmeasure[j];
sum[j] += fm;
sumSq[j] += fm*fm;
if( fm >= measure[j] )
numOver[j]++;
}
}
for( i=0; i<NumOfAngles; i++ )
{
/*The probability of random occurance*/
prob1Tail[i] = ((float) numOver[i])/nBSLoops;
/*Find the mean and stdev*/
mean[i] = sum[i]/nBSLoops;
temp = (sumSq[i] - mean[i]*mean[i]*nBSLoops)/( nBSLoops-1.0 );
sdev[i] = sqrt( temp );
printf( "\nAt angle=%d\n", angle[i] );
printf( "measured=%f mean=%f sdev=%f\n", measure[i],
mean[i], sdev[i] );
printf( "Prob: 1t=%f\n", prob1Tail[i] );
}
exit(0);
}
[LISTING THREE]
#include<stdio.h>
#include<math.h>
#define NumOfAngles 30
#define NumOfSMAngles 7
#define TEA {0,0,0,0,0,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,\
40,40,40,50,50,50,50,50}
#define TTA {-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,\
-10,-5,0,5,-10,-5,0,5,10}
#define TLL {23,11,10,10,20,19,9,13,10,24,20,11,11,13,35,22,13,17,14,25,\
34,13,9,10,19,22,10,8,11,25}
#define nBSLoops 1000000
main()
{
int i, j, k, index, numOver[NumOfAngles], temp, alpha;
int iseed = -534;
int SkyMap[NumOfSMAngles], FakeSkyMap[NumOfSMAngles];
int EarthAngle[NumOfAngles]=TEA;
int TeleAngle[NumOfAngles]=TTA;
int LightLevel[NumOfAngles]=TLL;
float sum[NumOfSMAngles], sumSq[NumOfSMAngles], mean[NumOfSMAngles];
float sdev[NumOfSMAngles], prob1Tail[NumOfSMAngles], fsm;
extern float ran1();
/*First, make sure certain arrays are zero*/
for( i=0; i<NumOfSMAngles; i++ )
{
sum[i] = 0;
sumSq[i] = 0;
numOver[i] = 0;
SkyMap[i] = 0;
}
/*Make the Measured SkyMap*/
for( i=0; i<NumOfAngles; i++ )
{
alpha = (EarthAngle[i]+TeleAngle[i])/10+1;
if( (alpha >= 0) && (alpha < NumOfSMAngles) )
SkyMap[ alpha ] += LightLevel[i];
}
/*With the measured SkyMap in hand, begin generating background maps*/
for( k=0; k<nBSLoops; k++ )
{
/*Make a fake EarthAngle array by performing Knuth shuffling*/
j = NumOfAngles;
do
{
index = (int)( (float)j * ran1( &iseed ) );
temp = EarthAngle[ index ];
EarthAngle[ index ] = EarthAngle[ j ];
EarthAngle[ j ] = temp;
j--;
}
while( j > 0 );
/*Zero out the FakeSkyMap*/
for( i=0; i<NumOfSMAngles; i++ )
FakeSkyMap[i] = 0;
/*Make the Fake SkyMap*/
for( i=0; i<NumOfAngles; i++ )
{
alpha = (EarthAngle[i]+TeleAngle[i])/10+1;
if( (alpha >= 0) && (alpha < NumOfSMAngles) )
FakeSkyMap[ alpha ] += LightLevel[i];
}
/*Find the components needed later for mean and probability*/
for( j=0; j<NumOfSMAngles; j++ )
{
fsm = (float)FakeSkyMap[j];
sum[j] += fsm;
sumSq[j] += fsm*fsm;
if( FakeSkyMap[j] >= SkyMap[j] )
numOver[j]++;
}
}
for( i=0; i<NumOfSMAngles; i++ )
{
/*The probability of random occurance*/
prob1Tail[i] = ((float) numOver[i])/(float)nBSLoops;
/*Find the mean and stdev*/
mean[i] = sum[i]/nBSLoops;
temp = (sumSq[i] - mean[i]*mean[i]*nBSLoops)/( nBSLoops-1.0 );
sdev[i] = sqrt( temp );
}
/*First, look at the observed distribution*/
for( i=0; i<NumOfSMAngles; i++ )
printf( "%d %d\n", (10*(i-1)), SkyMap[i] );
/*Then, the predicted*/
for( i=0; i<NumOfSMAngles; i++ )
printf( "%d %f %f\n", (10*(i-1)), mean[i], sdev[i] );
/*Now, the random chance probability*/
for( i=0; i<NumOfSMAngles; i++ )
printf( "alpha:%d Prob=%f\n", (10*(i-1)), prob1Tail[i] );
exit(0);
}
[LISTING FOUR]
#include<stdio.h>
#include<math.h>
#define NumOfAngles 30
#define NumOfSMAngles 7
#define TotalLightLevel 481
#define TEA {0,0,0,0,0,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,\
40,40,40,50,50,50,50,50}
#define TTA {-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,-10,-5,0,5,10,\
-10,-5,0,5,-10,-5,0,5,10}
#define TLL {23,11,10,10,20,19,9,13,10,24,20,11,11,13,35,22,13,17,14,25,\
34,13,9,10,19,22,10,8,11,25}
#define nBSLoops 100000
main()
{
int i, j, k, index, numOver[NumOfAngles], temp, alpha;
int iseed = -534;
int SkyMap[NumOfSMAngles], FakeSkyMap[NumOfSMAngles];
int EarthAngle[NumOfAngles]=TEA, DEarthAngle[TotalLightLevel];
int TeleAngle[NumOfAngles]=TTA, DTeleAngle[TotalLightLevel];
int LightLevel[NumOfAngles]=TLL;
float sum[NumOfSMAngles], mean[NumOfSMAngles];
float prob1Tail[NumOfSMAngles];
extern float ran1();
/*First, make sure certain arrays are zero*/
for( i=0; i<NumOfSMAngles; i++ )
{
sum[i] = 0;
sumSq[i] = 0;
numOver[i] = 0;
SkyMap[i] = 0;
}
/*Make the Measured SkyMap*/
for( i=0; i<NumOfAngles; i++ )
{
alpha = (EarthAngle[i]+TeleAngle[i])/10+1;
if( (alpha >= 0) && (alpha < NumOfSMAngles) )
SkyMap[ alpha ] += LightLevel[i];
}
/*Make the discrete angle sets*/
k = 0;
for( i=0; i<NumOfAngles; i++ )
for( j=0; j<LightLevel[i]; j++ )
{
DEarthAngle[k] = EarthAngle[i];
DTeleAngle[k] = TeleAngle[i];
k++;
}
/*With the measured SkyMap in hand, begin generating background maps*/
for( k=0; k<nBSLoops; k++ )
{
/*Make a fake EarthAngle array by performing Knuth shuffling*/
j = TotalLightLevel;
do
{
index = (int)( (float)j * ran1( &iseed ) );
temp = DEarthAngle[ index ];
DEarthAngle[ index ] = DEarthAngle[ j ];
DEarthAngle[ j ] = temp;
j--;
}
while( j > 0 );
/*Zero out the FakeSkyMap*/
for( i=0; i<NumOfSMAngles; i++ )
FakeSkyMap[i] = 0;
/*Make the Fake SkyMap*/
for( i=0; i<TotalLightLevel; i++ )
{
alpha = (DEarthAngle[i]+DTeleAngle[i])/10+1;
if( (alpha >= 0) && (alpha < NumOfSMAngles) )
FakeSkyMap[ alpha ]++;
}
/*Find the components needed later for mean and probability*/
for( j=0; j<NumOfSMAngles; j++ )
{
sum[j] += (float)FakeSkyMap[j];
if( FakeSkyMap[j] >= SkyMap[j] )
numOver[j]++;
}
}
for( i=0; i<NumOfSMAngles; i++ )
{
/*The probability of random occurance*/
prob1Tail[i] = ((float) numOver[i])/(float)nBSLoops;
/*Find the mean*/
mean[i] = sum[i]/nBSLoops;
}
/*First, look at the observed distribution*/
for( i=0; i<NumOfSMAngles; i++ )
printf( "%d %d\n", (10*(i-1)), SkyMap[i] );
/*Then, the predicted*/
for( i=0; i<NumOfSMAngles; i++ )
printf( "%d %f %f\n", (10*(i-1)), mean[i], sqrt( mean[i] ) );
/*Now, the random chance probability*/
for( i=0; i<NumOfSMAngles; i++ )
printf( "alpha:%d Prob=%f\n", (10*(i-1)), prob1Tail[i] );
exit(0);
}
End Listings