This function creates an environment for reinforcement learning.

makeEnvironment(class = "custom", discount = 1, ...)

Arguments

class

[character(1)] Class of environment. One of c("custom", "mdp", "gym", "gridworld").

discount

[numeric(1) in (0, 1)] Discount factor.

...

[any] Arguments passed on to the specific environment.

Value

R6 class of class Environment.

Details

Use the step method to interact with the environment.

Note that all states and actions are numerated starting with 0!

For a detailed explanation and more examples have a look at the vignette "How to create an environment?".

Methods

  • $step(action) Take action in environment. Returns a list with state, reward, done.

  • $reset() Resets the done flag of the environment and returns an initial state. Useful when starting a new episode.

  • $visualize() Visualizes the environment (if there is a visualization function).

Environments

Examples

step = function(self, action) { state = list(mean = action + rnorm(1), sd = runif(1)) reward = rnorm(1, state[[1]], state[[2]]) done = FALSE list(state, reward, done) } reset = function(self) { state = list(mean = 0, sd = 1) state } env = makeEnvironment(step = step, reset = reset, discount = 0.9) env$reset()
#> $mean #> [1] 0 #> #> $sd #> [1] 1 #>
env$step(100)
#> $state #> $state$mean #> [1] 98.42394 #> #> $state$sd #> [1] 0.9320559 #> #> #> $reward #> [1] 97.7765 #> #> $done #> [1] FALSE #>
# Create a Markov Decision Process. P = array(0, c(2, 2, 2)) P[, , 1] = matrix(c(0.5, 0.5, 0, 1), 2, 2, byrow = TRUE) P[, , 2] = matrix(c(0, 1, 0, 1), 2, 2, byrow = TRUE) R = matrix(c(5, 10, -1, 2), 2, 2, byrow = TRUE) env = makeEnvironment("mdp", transitions = P, rewards = R) env$reset()
#> [1] 0
env$step(1L)
#> $state #> [1] 1 #> #> $reward #> [1] 10 #> #> $done #> [1] TRUE #>
# Create a Gridworld. grid = makeEnvironment("gridworld", shape = c(4, 4), goal.states = 15, initial.state = 0) grid$visualize()
#> o - - - #> - - - - #> - - - - #> - - - -
#>
# NOT RUN { # Create an OpenAI Gym environment. # Make sure you have Python, gym and reticulate installed. env = makeEnvironment("gym", gym.name = "MountainCar-v0") # Take random actions for 200 steps. env$reset() for (i in 1:200) { action = sample(env$actions, 1) env$step(action) env$visualize() } env$close() # }