create markov decision process model -凯发官方首页
create markov decision process model
since r2019a
description
examples
create mdp model
create an mdp model with eight states and two possible actions.
mdp = createmdp(8,["up";"down"]);
specify the state transitions and their associated rewards.
% state 1 transition and reward mdp.t(1,2,1) = 1; mdp.r(1,2,1) = 3; mdp.t(1,3,2) = 1; mdp.r(1,3,2) = 1; % state 2 transition and reward mdp.t(2,4,1) = 1; mdp.r(2,4,1) = 2; mdp.t(2,5,2) = 1; mdp.r(2,5,2) = 1; % state 3 transition and reward mdp.t(3,5,1) = 1; mdp.r(3,5,1) = 2; mdp.t(3,6,2) = 1; mdp.r(3,6,2) = 4; % state 4 transition and reward mdp.t(4,7,1) = 1; mdp.r(4,7,1) = 3; mdp.t(4,8,2) = 1; mdp.r(4,8,2) = 2; % state 5 transition and reward mdp.t(5,7,1) = 1; mdp.r(5,7,1) = 1; mdp.t(5,8,2) = 1; mdp.r(5,8,2) = 9; % state 6 transition and reward mdp.t(6,7,1) = 1; mdp.r(6,7,1) = 5; mdp.t(6,8,2) = 1; mdp.r(6,8,2) = 1; % state 7 transition and reward mdp.t(7,7,1) = 1; mdp.r(7,7,1) = 0; mdp.t(7,7,2) = 1; mdp.r(7,7,2) = 0; % state 8 transition and reward mdp.t(8,8,1) = 1; mdp.r(8,8,1) = 0; mdp.t(8,8,2) = 1; mdp.r(8,8,2) = 0;
specify the terminal states of the model.
mdp.terminalstates = ["s7";"s8"];
input arguments
states
— model states
positive integer | string vector
model states, specified as one of the following:
positive integer — specify the number of model states. in this case, each state has a default name, such as
"s1"
for the first state.string vector — specify the state names. in this case, the total number of states is equal to the length of the vector.
actions
— model actions
positive integer | string vector
model actions, specified as one of the following:
positive integer — specify the number of model actions. in this case, each action has a default name, such as
"a1"
for the first action.string vector — specify the action names. in this case, the total number of actions is equal to the length of the vector.
output arguments
mdp
— mdp model
genericmdp
object
mdp model, returned as a genericmdp
object with the following
properties.
currentstate
— name of the current state
string
name of the current state, specified as a string.
states
— state names
string vector
state names, specified as a string vector with length equal to the number of states.
actions
— action names
string vector
action names, specified as a string vector with length equal to the number of actions.
t
— state transition matrix
3d array
state transition matrix, specified as a 3-d array, which determines the
possible movements of the agent in an environment. state transition matrix
t
is a probability matrix that indicates how likely the agent
will move from the current state s
to any possible next state
s'
by performing action a
.
t
is an
s-by-s-by-a array,
where s is the number of states and a is the
number of actions. it is given by:
the sum of the transition probabilities out from a nonterminal state
s
following a given action must sum up to one. therefore, all
stochastic transitions out of a given state must be specified at the same
time.
for example, to indicate that in state 1
following action
4
there is an equal probability of moving to states
2
or 3
, use the
following:
mdp.t(1,[2 3],4) = [0.5 0.5];
you can also specify that, following an action, there is some probability of remaining in the same state. for example:
mdp.t(1,[1 2 3 4],1) = [0.25 0.25 0.25 0.25];
r
— reward transition matrix
3d array
reward transition matrix, specified as a 3-d array, which determines how much
reward the agent receives after performing an action in the environment.
r
has the same shape and size as state transition matrix
t
. the reward for moving from state s
to
state s'
by performing action a
is given by:
terminalstates
— terminal state names in the grid world
string vector
terminal state names in the grid world, specified as a string vector of state names.
version history
introduced in r2019a
see also
functions
objects
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.