create a two-凯发官方首页

create a two-dimensional grid world for reinforcement learning

since r2019a

syntax

gw = creategridworld(m,n)

gw = creategridworld(m,n,moves)

description

example

gw = creategridworld(m,n) creates a grid world gw of size m-by-n with default actions of ['n';'s';'e';'w'].

gw = creategridworld(m,n,moves) creates a grid world gw of size m-by-n with actions specified by moves.

examples

create grid world environment

for this example, consider a 5-by-5 grid world with the following rules:

a 5-by-5 grid world bounded by borders, with 4 possible actions (north = 1, south = 2, east = 3, west = 4).
the agent begins from cell [2,1] (second row, first column).
the agent receives reward 10 if it reaches the terminal state at cell [5,5] (blue).
the environment contains a special jump from cell [2,4] to cell [4,4] with 5 reward.
the agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
all other actions result in -1 reward.

first, create a gridworld object using the creategridworld function.

gw = creategridworld(5,5)

gw = 
  gridworld with properties:
                gridsize: [5 5]
            currentstate: "[1,1]"
                  states: [25x1 string]
                 actions: [4x1 string]
                       t: [25x25x4 double]
                       r: [25x25x4 double]
          obstaclestates: [0x1 string]
          terminalstates: [0x1 string]
    probabilitytolerance: 8.8818e-16

now, set the initial, terminal and obstacle states.

gw.currentstate = '[2,1]';
gw.terminalstates = '[5,5]';
gw.obstaclestates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];

update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.

updatestatetranstionforobstacles(gw)
gw.t(state2idx(gw,"[2,4]"),:,:) = 0;
gw.t(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 1;

next, define the rewards in the reward transition matrix.

ns = numel(gw.states);
na = numel(gw.actions);
gw.r = -1*ones(ns,ns,na);
gw.r(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 5;
gw.r(:,state2idx(gw,gw.terminalstates),:) = 10;

now, use rlmdpenv to create a grid world environment using the gridworld object gw.

env = rlmdpenv(gw)

env = 
  rlmdpenv with properties:
       model: [1x1 rl.env.gridworld]
    resetfcn: []

you can visualize the grid world environment using the plot function.

plot(env)

input arguments

`m` — number of rows of the grid world
scalar

number of rows of the grid world, specified as a scalar.

`n` — number of columns of the grid world
scalar

number of columns of the grid world, specified as a scalar.

`moves` — action names
`'standard'` (default) | `'kings'`

action names, specified as either 'standard' or 'kings'. when moves is set to

'standard', the actions are ['n';'s';'e';'w'].
'kings', the actions are ['n';'s';'e';'w';'ne';'nw';'se';'sw'].

output arguments

`gw` — two-dimensional grid world
`gridworld` object

two-dimensional grid world, returned as a gridworld object with properties listed below. for more information, see create custom grid world environments.

`gridsize` — size of the grid world
`[m,n]` vector

size of the grid world, specified as a [m,n] vector.

`currentstate` — name of the current state
string

name of the current state, specified as a string.

`states` — state names
string vector

state names, specified as a string vector of length m*n.

`actions` — action names
string vector

action names, specified as a string vector. the length of the actions vector is determined by the moves argument.

actions is a string vector of length:

four, if moves is specified as 'standard'.
eight, moves is specified as 'kings'.

`t` — state transition matrix
3d array

state transition matrix, specified as a 3-d array, which determines the possible movements of the agent in an environment. state transition matrix t is a probability matrix that indicates how likely the agent will move from the current state s to any possible next state s' by performing action a. t is given by,

$t (s, s', a) = p r o b a b i l i t y (s' | s, a) .$

t is:

a k-by-k-by-4 array, if moves is specified as 'standard'. here, k = m*n.
a k-by-k-by-8 array, if moves is specified as 'kings'.

`r` — reward transition matrix
3d array

reward transition matrix, specified as a 3-d array, determines how much reward the agent receives after performing an action in the environment. r has the same shape and size as state transition matrix t. reward transition matrix r is given by,

$r = r (s, s', a) .$

r is:

a k-by-k-by-4 array, if moves is specified as 'standard'. here, k = m*n.
a k-by-k-by-8 array, if moves is specified as 'kings'.

`obstaclestates` — state names that cannot be reached in the grid world
string vector

state names that cannot be reached in the grid world, specified as a string vector.

`terminalstates` — terminal state names in the grid world
string vector

terminal state names in the grid world, specified as a string vector.

version history

introduced in r2019a

create a two-凯发官方首页

syntax

description

examples

create grid world environment

input arguments

`m` — number of rows of the grid world
scalar

`n` — number of columns of the grid world
scalar

`moves` — action names
`'standard'` (default) | `'kings'`

output arguments

`gw` — two-dimensional grid world
`gridworld` object

`gridsize` — size of the grid world
`[m,n]` vector

`currentstate` — name of the current state
string

`states` — state names
string vector

`actions` — action names
string vector

`t` — state transition matrix
3d array

`r` — reward transition matrix
3d array

`obstaclestates` — state names that cannot be reached in the grid world
string vector

`terminalstates` — terminal state names in the grid world
string vector

version history

see also

functions

objects

topics

create a two-凯发官方首页

syntax

description

examples

create grid world environment

input arguments

m — number of rows of the grid world scalar

n — number of columns of the grid world scalar

moves — action names 'standard' (default) | 'kings'

output arguments

gw — two-dimensional grid world gridworld object

gridsize — size of the grid world [m,n] vector

currentstate — name of the current state string

states — state names string vector

actions — action names string vector

t — state transition matrix 3d array

r — reward transition matrix 3d array

obstaclestates — state names that cannot be reached in the grid world string vector

terminalstates — terminal state names in the grid world string vector

version history

see also

functions

objects

topics

wechat

`m` — number of rows of the grid world
scalar

`n` — number of columns of the grid world
scalar

`moves` — action names
`'standard'` (default) | `'kings'`

`gw` — two-dimensional grid world
`gridworld` object

`gridsize` — size of the grid world
`[m,n]` vector

`currentstate` — name of the current state
string

`states` — state names
string vector

`actions` — action names
string vector

`t` — state transition matrix
3d array

`r` — reward transition matrix
3d array

`obstaclestates` — state names that cannot be reached in the grid world
string vector

`terminalstates` — terminal state names in the grid world
string vector