Dr. Dobb's Journal March 1997: Genetic Algorithms

Genetic Algorithms

By Satinder Singh, Peter Norvig, and David Cohn

Dr. Dobb's Journal March 1997

Figure 3: (a) Utility (over a finite agent lifetime), defined as the expected sum of the immediate reward and the long-term reward under the best possible policy. s_t is the state at time step t, Reward(s_t,a) is the immediate reward of executing action a in state s_t, N is the number of steps in the lifetime of the agent, and Reward_t is the reward at time step t. The operator E{.} stands for taking an expectation over all sources of randomness in the system; (b) utility (over an infinite lifetime), defined similarly as (a). To avoid the mathematical awkwardness of infinite sums, we introduce a discount factor, 0 < 1, which counts future rewards less than immediate rewards. This is similar to the compound interest that banks use.

Back to Article

Copyright © 1997, Dr. Dobb's Journal