create markov decision process model

since r2019a

syntax

mdp = createmdp(states,actions)

description

example

mdp = createmdp(states,actions) creates a markov decision process model with the specified states and actions.

examples

create mdp model

create an mdp model with eight states and two possible actions.

mdp = createmdp(8,["up";"down"]);

specify the state transitions and their associated rewards.

% state 1 transition and reward
mdp.t(1,2,1) = 1;
mdp.r(1,2,1) = 3;
mdp.t(1,3,2) = 1;
mdp.r(1,3,2) = 1;
% state 2 transition and reward
mdp.t(2,4,1) = 1;
mdp.r(2,4,1) = 2;
mdp.t(2,5,2) = 1;
mdp.r(2,5,2) = 1;
% state 3 transition and reward
mdp.t(3,5,1) = 1;
mdp.r(3,5,1) = 2;
mdp.t(3,6,2) = 1;
mdp.r(3,6,2) = 4;
% state 4 transition and reward
mdp.t(4,7,1) = 1;
mdp.r(4,7,1) = 3;
mdp.t(4,8,2) = 1;
mdp.r(4,8,2) = 2;
% state 5 transition and reward
mdp.t(5,7,1) = 1;
mdp.r(5,7,1) = 1;
mdp.t(5,8,2) = 1;
mdp.r(5,8,2) = 9;
% state 6 transition and reward
mdp.t(6,7,1) = 1;
mdp.r(6,7,1) = 5;
mdp.t(6,8,2) = 1;
mdp.r(6,8,2) = 1;
% state 7 transition and reward
mdp.t(7,7,1) = 1;
mdp.r(7,7,1) = 0;
mdp.t(7,7,2) = 1;
mdp.r(7,7,2) = 0;
% state 8 transition and reward
mdp.t(8,8,1) = 1;
mdp.r(8,8,1) = 0;
mdp.t(8,8,2) = 1;
mdp.r(8,8,2) = 0;

specify the terminal states of the model.

mdp.terminalstates = ["s7";"s8"];

input arguments

`states` — model states
positive integer | string vector

model states, specified as one of the following:

positive integer — specify the number of model states. in this case, each state has a default name, such as "s1" for the first state.
string vector — specify the state names. in this case, the total number of states is equal to the length of the vector.

`actions` — model actions
positive integer | string vector

model actions, specified as one of the following:

positive integer — specify the number of model actions. in this case, each action has a default name, such as "a1" for the first action.
string vector — specify the action names. in this case, the total number of actions is equal to the length of the vector.

output arguments

`mdp` — mdp model
`genericmdp` object

mdp model, returned as a genericmdp object with the following properties.

`currentstate` — name of the current state
string

name of the current state, specified as a string.

`states` — state names
string vector

state names, specified as a string vector with length equal to the number of states.

`actions` — action names
string vector

action names, specified as a string vector with length equal to the number of actions.

`t` — state transition matrix
3d array

state transition matrix, specified as a 3-d array, which determines the possible movements of the agent in an environment. state transition matrix t is a probability matrix that indicates how likely the agent will move from the current state s to any possible next state s' by performing action a. t is an s-by-s-by-a array, where s is the number of states and a is the number of actions. it is given by:

$t (s, s', a) = p r o b a b i l i t y (s' | s, a) .$

the sum of the transition probabilities out from a nonterminal state s following a given action must sum up to one. therefore, all stochastic transitions out of a given state must be specified at the same time.

for example, to indicate that in state 1 following action 4 there is an equal probability of moving to states 2 or 3, use the following:

mdp.t(1,[2 3],4) = [0.5 0.5];

you can also specify that, following an action, there is some probability of remaining in the same state. for example:

mdp.t(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];

`r` — reward transition matrix
3d array

reward transition matrix, specified as a 3-d array, which determines how much reward the agent receives after performing an action in the environment. r has the same shape and size as state transition matrix t. the reward for moving from state s to state s' by performing action a is given by:

$r = r (s, s', a) .$

`terminalstates` — terminal state names in the grid world
string vector

terminal state names in the grid world, specified as a string vector of state names.

version history

introduced in r2019a

create markov decision process model -凯发官方首页

syntax

description

examples

create mdp model

input arguments

`states` — model states
positive integer | string vector

`actions` — model actions
positive integer | string vector

output arguments

`mdp` — mdp model
`genericmdp` object

`currentstate` — name of the current state
string

`states` — state names
string vector

`actions` — action names
string vector

`t` — state transition matrix
3d array

`r` — reward transition matrix
3d array

`terminalstates` — terminal state names in the grid world
string vector

version history

see also

functions

objects

topics

create markov decision process model -凯发官方首页

syntax

description

examples

create mdp model

input arguments

states — model states positive integer | string vector

actions — model actions positive integer | string vector

output arguments

mdp — mdp model genericmdp object

currentstate — name of the current state string

states — state names string vector

actions — action names string vector

t — state transition matrix 3d array

r — reward transition matrix 3d array

terminalstates — terminal state names in the grid world string vector

version history

see also

functions

objects

topics

wechat

`states` — model states
positive integer | string vector

`actions` — model actions
positive integer | string vector

`mdp` — mdp model
`genericmdp` object

`currentstate` — name of the current state
string

`states` — state names
string vector

`actions` — action names
string vector

`t` — state transition matrix
3d array

`r` — reward transition matrix
3d array

`terminalstates` — terminal state names in the grid world
string vector