Genetic Algorithms
By Satinder Singh, Peter Norvig, and David Cohn
Dr. Dobb's Journal March 1997
Figure 3: (a) Utility (over a finite agent lifetime), defined as the expected sum of the immediate reward and the long-term reward under the best possible policy. st is the state at time step t, Reward(st,a) is the immediate reward of executing action a in state st, N is the number of steps in the lifetime of the agent, and Rewardt is the reward at time step t. The operator E{.} stands for taking an expectation over all sources of randomness in the system; (b) utility (over an infinite lifetime), defined similarly as (a). To avoid the mathematical awkwardness of infinite sums, we introduce a discount factor, 0
< 1, which counts future rewards less than immediate rewards. This is similar to the compound interest that banks use.
Back to Article
Copyright © 1997, Dr. Dobb's Journal