Markov decision processes (MDPs) are at the core of reinforcement learning theory. Similar to Markov chains, MDPs involve an underlying Markovian process that evolves from one state to another, with the probability of visiting a new state being dependent on the most recent state. Different from Markov chains, MDPs involve both agents and actions taken by these agents. As a result, the next state is dependent on which action was chosen at the state preceding it. MDPs therefore provide a powerful framework to explore state spaces and to learn from actions and rewards.
Review the options below to login to check your access.
Log in with your Cambridge Higher Education account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.