This function creates an environment for reinforcement learning.
makeEnvironment(class = "custom", discount = 1, ...)
class | [ |
---|---|
discount | [ |
... | [ |
R6 class of class Environment.
Use the step
method to interact with the environment.
Note that all states and actions are numerated starting with 0!
For a detailed explanation and more examples have a look at the vignette "How to create an environment?".
$step(action)
Take action in environment.
Returns a list with state
, reward
, done
.
$reset()
Resets the done
flag of the environment and returns an initial state.
Useful when starting a new episode.
$visualize()
Visualizes the environment (if there is a visualization function).
MountainCar
step = function(self, action) { state = list(mean = action + rnorm(1), sd = runif(1)) reward = rnorm(1, state[[1]], state[[2]]) done = FALSE list(state, reward, done) } reset = function(self) { state = list(mean = 0, sd = 1) state } env = makeEnvironment(step = step, reset = reset, discount = 0.9) env$reset()#> $mean #> [1] 0 #> #> $sd #> [1] 1 #>env$step(100)#> $state #> $state$mean #> [1] 98.42394 #> #> $state$sd #> [1] 0.9320559 #> #> #> $reward #> [1] 97.7765 #> #> $done #> [1] FALSE #># Create a Markov Decision Process. P = array(0, c(2, 2, 2)) P[, , 1] = matrix(c(0.5, 0.5, 0, 1), 2, 2, byrow = TRUE) P[, , 2] = matrix(c(0, 1, 0, 1), 2, 2, byrow = TRUE) R = matrix(c(5, 10, -1, 2), 2, 2, byrow = TRUE) env = makeEnvironment("mdp", transitions = P, rewards = R) env$reset()#> [1] 0env$step(1L)#> $state #> [1] 1 #> #> $reward #> [1] 10 #> #> $done #> [1] TRUE #># Create a Gridworld. grid = makeEnvironment("gridworld", shape = c(4, 4), goal.states = 15, initial.state = 0) grid$visualize()#> o - - - #> - - - - #> - - - - #> - - - -#># NOT RUN { # Create an OpenAI Gym environment. # Make sure you have Python, gym and reticulate installed. env = makeEnvironment("gym", gym.name = "MountainCar-v0") # Take random actions for 200 steps. env$reset() for (i in 1:200) { action = sample(env$actions, 1) env$step(action) env$visualize() } env$close() # }