Artificial Intelligence


Belief Maintenance Using The Dempster-Shafer Theory Of Evidence

Dwayne Phillips


The author works as a computer and electronics engineer with the U.S. Department of Defense and is a doctoral candidate in Electrical and Computer Engineering at Louisiana State University. His interests include computer vision, artificial intelligence, software engineering, and programming languages. He first used the Dempster Shafer theory of evidence in 1984 and uses it extensively in his PhD research into computer vision.

An expert system makes a decision given an amount of evidence. Usually it must choose between several competing answers or hypotheses. The human expert keeps these answers in his mind while he thinks over the problem. He gathers evidence and shifts his thoughts from one answer to another. After gathering evidence, he chooses the most favorable answer. We all do this in our daily decisions, but we don't think about the process, and we certainly don't keep track of specific numbers in our head.

An expert system needs a sub-system to pool evidence and reach decisions: a belief maintenance system. The belief maintenance system keeps track of the hypotheses and the degree of belief attributed to each hypothesis. When the expert system finishes gathering evidence, the belief maintenance system chooses the answer.

In some expert systems a belief maintenance system is not necessary, because some expert systems make decisions based on a single, clear cut piece of evidence. For instance, suppose an expert system has the task of rolling up the windows in your car. The evidence is whether or not it is raining. The system would check the atmosphere and ask, "Is it raining?" If the answer were yes, it would roll up the windows.

In other expert systems a belief maintenance system is essential. Suppose the expert system had to decide at 9.00 AM whether or not to roll up the windows at 3.00 PM. Now the question is tougher. Evidence would include the daily weather forecast, the wind speed and direction, the relative humidity, weather records from past years, forecasts from the Farmer's Almanac, satellite photographs, and other relevant sources. The expert system would pool all the evidence and arrive at an answer.

Consider the nature of evidence. Some evidence is not reliable (the weatherman is wrong sometimes and right sometimes). Some evidence is uncertain (an intermittent atmospheric reading). Some is incomplete (the wind speed by itself does not tell us much). Some evidence is contradictory (the weatherman's forecast and the atmospheric conditions). Finally, some evidence is incorrect (a broken atmospheric sensor or a wrong weather forecast).

The belief maintenance system must deal with these factors, taking the evidence, assigning a measure of belief to each hypothesis, and changing this belief as new evidence becomes available. The resulting decision must be the same regardless of the order in which the system gathers the evidence.

The method of belief maintenance that most of us know is classical probability. The basic properties of this system are [Beyer]:

 A)     P(Æ) = 0 (null set)
 B)     P(Q) = 1 (entire sample set)
 C)     P(A) = 1 - P(A')
 D)     P(AÈB) = P(A) + P(B), if A and B are mutually exclusive
 E)     P(AÇB) = P(A) * P(B), if A and B are mutually exclusive
Another belief maintenance system came from the MYCIN project (a pioneering medical expert system developed in the early seventies by Edward Shortliffe at Stanford.) MYCIN used a system of certainty factors to keep track of hypotheses. Shortliffe later dropped the certainty factor system for the Dempster-Shafer theory of evidence.

The Dempster-Shafer (D-S) theory of evidence was created by Glen Shafer [Shafer, 1976] at Princeton. He built on earlier work performed by Arthur Dempster. The theory is a broad treatment of probabilities, and includes classical probability and Shortliffe's certainty factors as subsets.

In the D-S theory of evidence, the set of all hypotheses that describes a situation is the frame of discernment. The letter Q denotes the frame of discernment. The hypotheses in Q must be mutually exclusive and exhaustive, meaning that they must cover all the possibilities and that the individual hypotheses cannot overlap.

The D-S theory mirrors human resoning by narrowing its reasoning gradually as more evidence becomes available. Two properties of the D-S theory permit this process: the ability to assign belief to ignorance, and the ability to assign belief to subsets of hypotheses.

An example provides the easiest way to understand these properties and how they differ from classical probability. Suppose we want to decide which of three persons in an office — Adam, Bob, and Carol — will come in early to turn on the lights and make coffee.

In the D-S theory the set Q = {Adam or Bob or Carol}. The sets {Adam}, {Bob}, and {Carol} are the mutually exclusive and exhaustive hypotheses. They are singletons. In the frame of discernment there are 2Q or 8 possible interpretations. (Figure 1)

Figure 1 contains two special sets in {Æ} and {Adam, Bob, Carol}. The first is the null set, which cannot hold any value. As later examples will show, the null set normalizes beliefs. The second special set is {Adam or Bob or Carol}, represented by Q. Assigning belief to Q does not help distinguish anything. Therefore, Q represents ignorance.

Representing ignorance is a key concept. Humans often give weight to the hypothesis "I don't know", which is not possible in classical probability. Assigning belief to "I don't know" allows us to delay a decision until more evidence becomes available. This mirrors the human tendency to procrastinate.

Suppose that given a piece of evidence, we make the assertion shown in Figure 2. The D-S theory calls an assertion a basic probability assignment. The M in Figure 2 represents the measure of belief. The assertion of Figure 2 says that we believe Adam is the best choice with a weight of 0.6. We'll give the other 0.4 of belief to Q or "I don't know," thus allowing us to delay deciding on Adam.

We cannot make this type of assertion in classical probability. The classical system's property of complements given earlier forces us to give Adam' 0.4 (the complement of Adam) if we give 0.6 to Adam. In this case Adam' = {Bob or Carol}. Notice the difference between Q ={Adam or Bob or Carol} and Adam' ={Bob or Carol}. Adam' gives more belief to Bob and Carol than we want. Q allows us to express a true "no comment" on the situation.

Assigning belief to subsets in the D-S theory allows us to assign belief to a general concept instead of being too specific. Suppose in our example that the local police advise us that we should not have women coming to work early by themselves. We would make an assertion like that, shown in Figure 3. This assertion gives a weight of 0.7 to the subset {Adam or Bob} and a weight of 0.3 to ignorance or no comment.

Classical probability does not permit a subset assertion. Recall that property D requires P(Adam or Bob) = P(Adam) + P(Bob). That property would force us to assign specific beliefs to Adam and to Bob individually. We do not want to be that specific. We want to procrastinate and think it over some more.

Also, property C would make us assert the 0.3 to the complement of {Adam or Bob} which is {Carol}. We do not want to assert 0.3 to {Carol}. Assigning belief directly to {Carol} would contradict the evidence the police gave us.

The D-S theory employs Dempster's rule of combination to combine two assertions. The mathematical formulas may be found in the references. They confuse the best of us, but they are simple when illustrated. Figure 4 shows how the two assertions combine.

The table in Figure 4 is an intersection tableau. Which lists one assertion across the top and one down the side. Inside the tableau are the intersections of the sets in the rows and columns, with the products assigned to the intersections. The measures of belief inside the table sum to the final values given below the table.

Notice how combination narrows the decision process. The single set {Adam} now has the highest belief. The subset {Adam or Bob} comes in second with no comment last.

Now suppose that we require the first person in the office in the morning to bring up the computer system. Carol is an expert at this so we make the assertion shown in Figure 5. This attributes most of the belief to {Carol}. This new requirement or piece of evidence contradicts the previous evidence given by the police. That is the nature of evidence. Dempster's rule of combination allows us to combine the contradictory evidence and draw a logical conclusion.

Figure 6 shows the combination of the result of Figure 4 and the assertion of Figure 5. Inside the intersection tableau is the null set. There is no intersection between the set {Carol} and the set {Adam} and there is also no intersection between the set {Carol} and the set {Adam, Bob}. The null set cannot hold any value. Therefore, it normalizes the beliefs of the other subsets. The sum of the beliefs of the other subsets is divided by one minus the belief in the null set. The beliefs of all the subsets sum to one. The bottom of Figure 6 shows this extra step.

As a result, Carol is now the choice for coming in early in the morning. If she is unable to do so, then Adam is the logical replacement. If Adam is unavailable, then Bob comes in early.

Implementation

The preceding examples show that no complex mathematics are involved in combining two assertions. Dempster's rule of combination uses simple addition, subtraction, multiplication, and division. The only tricky part is the intersections of the sets in the tableau.

There are several ways to solve the intersection question. Since there are three singletons and 23 total interpretations, we'll represent the hypotheses with three bits as in Figure 7.

Listing 2 shows the C function that combines two assertions. The inputs are two belief vectors, each holding an assertion. The belief vector is a one-dimensional array of floats. In our examples, the LENGTH_OF_BELIEF_VECTOR is eight because we have three singletons and 23=8. The belief vector has a space, or slot, for each hypothesis, ordered as in Figure 7. The belief vector is awkward to initialize since we would like Adam in slot one, not slot four, and Carol in slot three, not slot one. Nevertheless, a uniform belief vector allows a very simple subroutine to combine the assertions.

The first for loop initializes the sum_vector, the belief vector which holds the sums of the values found inside the intersection tableau. sum_vector holds the sums for later when normalization occurs.

The for a loop goes through the belief vectors, finds the intersections, and calculates the products. The two if > 0.0 statements reduce processing time by eliminating unnecessary multiplication by zero. The function uses the C bitwise AND operator & to find the intersection of sets. Without the bitwise AND, the function would be much longer and much more complex.

The last for loop performs the normalization. The values in sum_vector are divided by one minus the value assigned to the null set. The answer is stored in vector1.

The combine_using_dempsters_rule function is the meat of the program written in Turbo C v1.5. I used this compiler because it had a few functions that made the user interface more pleasant. Except for those functions, there is nothing in the program that is machine, compiler, or operating system specific.

One important note about implementing Dempster's rule of combination. The number of calculations depends on 2Q. In our example there were eight hypotheses. Alternatively, 200 single hypotheses would produce 2200 subsets, 2200 slots in the belief vector, and 2200 floating point calculations. This gets out of hand rather quickly. Several of the references [Gordon, Shortliffe 1985] [Shafer 1985] [Shafer 1987] deal exclusively with this topic. The discussion and proposed solutions are beyond the scope of this article.

Conclusion

The Dempster-Shafer theory of evidence is one method that an expert system may use to keep score on competing hypotheses while it gathers evidence and draws a logical conclusion. It is more general and capable than the classical probability with which most of us are familiar. It is easy to implement and executes quickly as long as the number of hypotheses is manageable. I suggest you try it on your next expert system or AI-related project.

References

Beyer, William H., CRC Standard Mathematical Tables, 26th edition, CRC Press, 1983, pp. 503-559.

Gordon, Jean, Edward H. Shortliffe, "The Dempster-Shafer, Theory of Evidence," pp. 272-292 of Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based Expert Systems, Addison Wesley Publishing Company, 1984.

Gordon, Jean, Edward H. Shortliffe, "A Method for Managing Evidential Reasoning in a Hierarchical Hypothesis Space," Artificial Intelligence, Vol. 26, No. 3, July 1985, pp. 323-357.

Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based Expert Systems, Addison Wesley Publishing Company, 1984.

Shafer, Glen, A Mathematical Theory of Evidence, Princeton University Press, 1976.

Shafer, Glen, "Hierarchical Evidence," The Second Conference on Artificial Intelligence Applications, IEEE Press, December 1985, pp. 16-21.

Shafer, Glen, Roger Logan, "Implementing Dempster's Rule for Hierarchical Evidence," Artificial Intelligence, Vol. 33, No. 3, November 1987, pp. 271-298.