Markov Decision Process environment.

Arguments

transitions

[array (n.states x n.states x n.actions)] State transition array.

rewards

[matrix (n.states x n.actions)] Reward array.

initial.state

[integer] Optional starting state. If a vector is given a starting state will be randomly sampled from this vector whenever reset is called. Note that states are numerated starting with 0. If initial.state = NULL all non-terminal states are possible starting states.

...

[any] Arguments passed on to makeEnvironment.

Usage

makeEnvironment("MDP", transitions, rewards, initial.state, ...)

Methods

  • $step(action) Take action in environment. Returns a list with state, reward, done.

  • $reset() Resets the done flag of the environment and returns an initial state. Useful when starting a new episode.

  • $visualize() Visualizes the environment (if there is a visualization function).

Examples

# Create a Markov Decision Process. P = array(0, c(2, 2, 2)) P[, , 1] = matrix(c(0.5, 0.5, 0, 1), 2, 2, byrow = TRUE) P[, , 2] = matrix(c(0, 1, 0, 1), 2, 2, byrow = TRUE) R = matrix(c(5, 10, -1, 2), 2, 2, byrow = TRUE) env = makeEnvironment("mdp", transitions = P, rewards = R) env$reset()
#> [1] 0
env$step(1L)
#> $state #> [1] 1 #> #> $reward #> [1] 10 #> #> $done #> [1] TRUE #>