Gymnasium Documentation

コンテンツ

The classic “agent-environment loop” pictured below is simplified representation of reinforcement learning that Gymnasium implements.

This loop is implemented using the following gymnasium code

import gymnasium as gym env = gym.make("LunarLander-v2", render_mode="human") observation, info = env.reset() for _ in range(1000): action = env.action_space.sample() # agent policy that uses the observation and info observation, reward, terminated, truncated, info = env.step(action) if terminated or truncated: observation, info = env.reset() env.close()

First, an environment is created using make with an additional keyword "render_mode" that specifies how the environment should be visualised. See render for details on the default meaning of different render modes. In this example, we use the "LunarLander" environment where the agent controls a spaceship that needs to land safely.

After initializing the environment, we reset the environment to get the first observation of the environment. For initializing the environment with a particular random seed or options (see environment documentation for possible values) use the seed or options parameters with reset.

Next, the agent performs an action in the environment, step, this can be imagined as moving a robot or pressing a button on a games’ controller that causes a change within the environment. As a result, the agent receives a new observation from the updated environment along with a reward for taking the action. This reward could be for instance positive for destroying an enemy or a negative reward for moving into lava. One such action-observation exchange is referred to as a timestep.

However, after some timesteps, the environment may end, this is called the terminal state. For instance, the robot may have crashed, or the agent have succeeded in completing a task, the environment will need to stop as the agent cannot continue. In gymnasium, if the environment has terminated, this is returned by step. Similarly, we may also want the environment to end after a fixed number of timesteps, in this case, the environment issues a truncated signal. If either of terminated or truncated are true then reset should be called next to restart the environment.

要約する
The agent-environment loop in reinforcement learning is simplified in Gymnasium, where an environment is created and the agent interacts with it by taking actions and receiving observations and rewards. The loop involves initializing the environment, resetting it to get the first observation, performing actions in the environment, receiving rewards, and handling terminal or truncated states. The agent's actions lead to changes in the environment, and after some timesteps, the environment may end due to success or failure. If the environment terminates or is truncated, it needs to be reset to start a new episode. This process is crucial in reinforcement learning for the agent to learn and improve its behavior based on the interactions with the environment.