create a two-凯发官方首页
create a two-dimensional grid world for reinforcement learning
since r2019a
description
examples
create grid world environment
for this example, consider a 5-by-5 grid world with the following rules:
a 5-by-5 grid world bounded by borders, with 4 possible actions (north = 1, south = 2, east = 3, west = 4).
the agent begins from cell [2,1] (second row, first column).
the agent receives reward 10 if it reaches the terminal state at cell [5,5] (blue).
the environment contains a special jump from cell [2,4] to cell [4,4] with 5 reward.
the agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
all other actions result in -1 reward.
first, create a gridworld
object using the creategridworld
function.
gw = creategridworld(5,5)
gw = gridworld with properties: gridsize: [5 5] currentstate: "[1,1]" states: [25x1 string] actions: [4x1 string] t: [25x25x4 double] r: [25x25x4 double] obstaclestates: [0x1 string] terminalstates: [0x1 string] probabilitytolerance: 8.8818e-16
now, set the initial, terminal and obstacle states.
gw.currentstate = '[2,1]'; gw.terminalstates = '[5,5]'; gw.obstaclestates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];
update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.
updatestatetranstionforobstacles(gw) gw.t(state2idx(gw,"[2,4]"),:,:) = 0; gw.t(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 1;
next, define the rewards in the reward transition matrix.
ns = numel(gw.states); na = numel(gw.actions); gw.r = -1*ones(ns,ns,na); gw.r(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 5; gw.r(:,state2idx(gw,gw.terminalstates),:) = 10;
now, use rlmdpenv
to create a grid world environment using the gridworld
object gw
.
env = rlmdpenv(gw)
env = rlmdpenv with properties: model: [1x1 rl.env.gridworld] resetfcn: []
you can visualize the grid world environment using the plot
function.
plot(env)
input arguments
m
— number of rows of the grid world
scalar
number of rows of the grid world, specified as a scalar.
n
— number of columns of the grid world
scalar
number of columns of the grid world, specified as a scalar.
moves
— action names
'standard'
(default) | 'kings'
action names, specified as either 'standard'
or
'kings'
. when moves
is set to
'standard'
, the actions are['n';'s';'e';'w']
.'kings'
, the actions are['n';'s';'e';'w';'ne';'nw';'se';'sw']
.
output arguments
gw
— two-dimensional grid world
gridworld
object
two-dimensional grid world, returned as a gridworld
object with
properties listed below. for more information, see create custom grid world environments.
gridsize
— size of the grid world
[m,n]
vector
size of the grid world, specified as a [m,n]
vector.
currentstate
— name of the current state
string
name of the current state, specified as a string.
actions
— action names
string vector
action names, specified as a string vector. the length of the
actions
vector is determined by the
moves
argument.
actions
is a string vector of length:
four, if
moves
is specified as'standard'
.eight,
moves
is specified as'kings'
.
t
— state transition matrix
3d array
state transition matrix, specified as a 3-d array, which determines the
possible movements of the agent in an environment. state transition matrix
t
is a probability matrix that indicates how likely the agent
will move from the current state s
to any possible next state
s'
by performing action a
.
t
is given by,
t
is:
a
k
-by-k
-by-4 array, ifmoves
is specified as'standard'
. here,k
=m
*n
.a
k
-by-k
-by-8 array, ifmoves
is specified as'kings'
.
r
— reward transition matrix
3d array
reward transition matrix, specified as a 3-d array, determines how much reward
the agent receives after performing an action in the environment.
r
has the same shape and size as state transition matrix
t
. reward transition matrix r
is given by,
r
is:
a
k
-by-k
-by-4 array, ifmoves
is specified as'standard'
. here,k
=m
*n
.a
k
-by-k
-by-8 array, ifmoves
is specified as'kings'
.
obstaclestates
— state names that cannot be reached in the grid world
string vector
state names that cannot be reached in the grid world, specified as a string vector.
terminalstates
— terminal state names in the grid world
string vector
terminal state names in the grid world, specified as a string vector.
version history
introduced in r2019a
see also
functions
objects
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.