In most multistage decision problems, we are interested in determining the optimal strategy, π⋆(a|s) (i.e., the optimal actions to follow in the state–action space). Most of the algorithms described in the previous chapters focused on evaluating the state and state–action value functions, υπ(s) and qπ(s,a), for a given policy π(a|s). More is needed to learn the optimal policy.
Review the options below to login to check your access.
Log in with your Cambridge Higher Education account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.