The various reinforcement learning algorithms described in the last two chapters rely on estimating state values, υπ(s), or state–action values, qπ(s,a), directly.
Review the options below to login to check your access.
Log in with your Cambridge Higher Education account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.