Genetic Algorithms
By Satinder Singh, Peter Norvig, and David Cohn
Dr. Dobb's Journal March 1997
Figure 6: The program's experience consists of a trajectory through state space. At time step t, the state is st and the agent faces a choice of actions. Note the action the agent chooses to execute at step t is at. The reward at step t, Rewardt, is a function of st and at. The next state st+1 depends on st, at, and many random events such as passengers arriving at floors and pushing buttons. Reinforcement learning allows a program to use such a trajectory to incrementally improve its policy.
Back to Article
Copyright © 1997, Dr. Dobb's Journal