We derived in the previous two chapters procedures for assessing the performance of strategies used by agents interacting with a Markov decision process (MDP), including obtaining optimal policies. Among other methods, we discussed the policy evaluation algorithm (44.116) and the value and policy iterations (45.23) and (45.43), respectively.
Review the options below to login to check your access.
Log in with your Cambridge Higher Education account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.