Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
also be chained to combine their effects. Most environments that are generated via gym.make
will already be wrapped by default.
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along with (possibly optional) parameters to the wrapper’s constructor:
>>> import gym
>>> from gym.wrappers import RescaleAction
>>> base_env = gym.make("BipedalWalker-v3")
>>> base_env.action_space
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
>>> wrapped_env.action_space
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
You can access the environment underneath the first wrapper by using
the .env
>>> wrapped_env
>>> wrapped_env.env
If you want to get to the environment underneath all of the layers of wrappers,
you can use the .unwrapped
If the environment is already a bare environment, the .unwrapped
attribute will just return itself.
>>> wrapped_env
>>> wrapped_env.unwrapped
<gym.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
There are three common things you might want a wrapper to do:
Transform actions before applying them to the base environment
Transform observations that are returned by the base environment
Transform rewards that are returned by the base environment
Such wrappers can be easily implemented by inheriting from ActionWrapper
, ObservationWrapper
, or RewardWrapper
and implementing the
respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the Wrapper
class directly.
The code that is presented in the following sections can also be found in
the gym-examples repository
If you would like to apply a function to the action before passing it to the base environment,
you can simply inherit from ActionWrapper
and overwrite the method action
to implement that transformation.
The transformation defined in that method must take values in the base environment’s action space.
However, its domain might differ from the original action space. In that case, you need to specify the new
action space of the wrapper by setting self._action_space
in the __init__
method of your wrapper.
Let’s say you have an environment with action space of type Box
, but you would
only like to use a finite subset of actions. Then, you might want to implement the following wrapper
class DiscreteActions(gym.ActionWrapper):
def __init__(self, env, disc_to_cont):
self.disc_to_cont = disc_to_cont
self._action_space = Discrete(len(disc_to_cont))
def action(self, act):
return self.disc_to_cont[act]
if __name__ == "__main__":
env = gym.make("LunarLanderContinuous-v2")
wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]),
np.array([0,1]), np.array([0,-1])])
print(wrapped_env.action_space) #Discrete(4)
Among others, Gym provides the action wrappers ClipAction
and RescaleAction
If you would like to apply a function to the observation that is returned by the base environment before passing
it to learning code, you can simply inherit from ObservationWrapper
and overwrite the method observation
implement that transformation. The transformation defined in that method must be defined on the base environment’s
observation space. However, it may take values in a different space. In that case, you need to specify the new
observation space of the wrapper by setting self._observation_space
in the __init__
method of your wrapper.
For example, you might have a 2D navigation task where the environment returns dictionaries as observations with keys "agent_position"
and "target_position"
. A common thing to do might be to throw away some degrees of freedom and only consider
the position of the target relative to the agent, i.e. observation["target_position"] - observation["agent_position"]
For this, you could implement an observation wrapper like this:
class RelativePosition(gym.ObservationWrapper):
def __init__(self, env):
self._observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
def observation(self, obs):
return obs["target"] - obs["agent"]
Among others, Gym provides the observation wrapper TimeAwareObservation
, which adds information about the index of the timestep
to the observation.
If you would like to apply a function to the reward that is returned by the base environment before passing
it to learning code, you can simply inherit from RewardWrapper
and overwrite the method reward
implement that transformation. This transformation might change the reward range; to specify the reward range of
your wrapper, you can simply define self._reward_range
in __init__
Let us look at an example: Sometimes (especially when we do not have control over the reward because it is intrinsic), we want to clip the reward to a range to gain some numerical stability. To do that, we could, for instance, implement the following wrapper:
class ClipReward(gym.RewardWrapper):
def __init__(self, env, min_reward, max_reward):
self.min_reward = min_reward
self.max_reward = max_reward
self._reward_range = (min_reward, max_reward)
def reward(self, reward):
return np.clip(reward, self.min_reward, self.max_reward)
Some users may want a wrapper which will automatically reset its wrapped environment when its wrapped environment reaches the done state. An advantage of this environment is that it will never produce undefined behavior as standard gym environments do when stepping beyond the done state.
When calling step causes self.env.step() to return (terminated or truncated)=True), self.env.reset() is called, and the return format of self.step() is as follows:
new_obs, closing_reward, closing_terminated, closing_truncated, info
new_obs is the first observation after calling self.env.reset(),
closing_reward is the reward after calling self.env.step(), prior to calling self.env.reset()
The expression (closing_terminated or closing_truncated) is always True
info is a dict containing all the keys from the info dict returned by the call to self.env.reset(), with an additional key “closing_observation” containing the observation returned by the last call to self.env.step() and “closing_info” containing the info dict returned by the last call to self.env.step().
If (terminated or truncated) is not true when self.env.step() is called, self.step() returns
obs, reward, terminated, truncated, info
as normal.
The AutoResetWrapper is not applied by default when calling gym.make(), but can be applied by setting the optional autoreset argument to True:
env = gym.make("CartPole-v1", autoreset=True)
The AutoResetWrapper can also be applied using its constructor:
env = gym.make("CartPole-v1")
env = AutoResetWrapper(env)
When using the AutoResetWrapper to collect rollouts, note that the when self.env.step() returns (terminated or truncated)=True, a new observation from after calling self.env.reset() is returned by self.step() alongside the terminal reward and done state from the previous episode . If you need the terminal state from the previous episode, you need to retrieve it via the the “closing_observation” key in the info dict. Make sure you know what you’re doing if you use this wrapper!
Due to the breaking change with step method returning two bools instead of one, this wrapper is introduced for ease of transition. This wrapper is applied by default in make to transform any environment into the new API.
>>> import gym
>>> env = gym.make("CartPole-v1")
>>> env
>>> env.reset()
array([-0.03018865, -0.02190439, -0.02665936, 0.02980426], dtype=float32)
>>> env.step(env.action_space.sample())
(array([-0.03062674, 0.17358953, -0.02606327, -0.27116933], dtype=float32), 1.0, False, False, {})
can be set at make to transform new step API to the old one.
>>> env = gym.make("CartPole-v1", return_two_dones=False)
>>> env.reset()
array([0.02902522, 0.01894217, 0.00593221, 0.03430589], dtype=float32)
>>> env.step(env.action_space.sample())
(array([ 0.03368363, 0.40900537, 0.00148834, -0.54708755], dtype=float32), 1.0, False, {})
Registered environments in old API are automatically transformed into new API since this is the default setting (for eg. atari envs)
>>> env = gym.make("ALE/Breakout-v5")
>>> obs = env.reset()
>>> step_returns = env.step(env.action_space.sample())
>>> len(step_returns)
To retain old API, set return_two_dones=False
>>> env = gym.make("ALE/Breakout-v5", return_two_dones=False)
>>> obs = env.reset()
>>> step_returns = env.step(env.action_space.sample())
>>> len(step_returns)
Vector envs do not directly support old step API. Instead, StepCompatibilityVector
wrapper can be used, while setting return_two_dones=False
. This can be done in make for registered environments as well as explicitly.
from gym.vector import StepCompatibilityVector, SyncVectorEnv
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> obs = envs.reset()
>>> step_returns = envs.step(envs.action_space.sample())
>>> len(step_returns)
>>> envs = gym.vector.make("CartPole-v1", num_envs=3, return_two_dones=False)
>>> obs = envs.reset()
>>> step_returns = envs.step(envs.action_space.sample())
>>> len(step_returns)
>>> envs = StepCompatibilityVector(SyncVectorEnv([NewAPIEnv, NewAPIEnv]), return_two_dones=False)
>>> obs = envs.reset()
>>> step_returns = envs.step(envs.action_space.sample())
>>> len(step_returns)
Here, NewAPIEnv is any environment class defined with the new API, not defined here.
General Wrappers#
Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
reward based on data in info
or change the rendering behavior).
Such wrappers can be implemented by inheriting from Wrapper
You can set a new action or observation space by defining
, respectivelyYou can set new metadata and reward range by defining
, respectivelyYou can override
etc. If you do this, you can access the environment that was passed to your wrapper (which still might be wrapped in some other wrapper) by accessing the attributeself.env
Let’s also take a look at an example for this case. Most MuJoCo environments return a reward that consists
of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
initialization of the environment. However, Reacher does not allow you to do this! Nevertheless, all individual terms
of the reward are returned in info
, so let us build a wrapper for Reacher that allows us to weight those terms:
class ReacherRewardWrapper(gym.Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
self.reward_dist_weight = reward_dist_weight
self.reward_ctrl_weight = reward_ctrl_weight
def step(self, action):
obs, _, terminated, truncated, info = self.env.step(action)
reward = self.reward_dist_weight*info["reward_dist"] + self.reward_ctrl_weight*info["reward_ctrl"]
return obs, reward, terminated, truncated, info
It is not sufficient to use a RewardWrapper
in this case!
Available Wrappers#
Name |
Type |
Arguments |
Description |
Implements the best practices from Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents” but will be deprecated soon. |
Clip the continuous action to the valid bound specified by the environment’s |
If you have an environment that returns dictionaries as observations, but you would like to only keep a subset of the entries, you can use this wrapper. |
Observation wrapper that flattens the observation |
Observation wrapper that stacks the observations in a rolling manner. For example, if the number of stacks is 4, then the returned observation contains the most recent 4 observations. Observations will be objects of type |
Convert the image observation from RGB to gray scale. By default, the resulting observation will be 2-dimensional. If |
This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance. |
This wrapper will normalize observations s.t. each coordinate is centered with unit variance. The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was newly instantiated or the policy was changed recently. |
This will produce an error if |
Augment observations by pixel values obtained via |
This will keep track of cumulative rewards and episode lengths. At the end of an episode, the statistics of the episode will be added to |
This wrapper will record videos of rollouts. The results will be saved in the folder specified via |
Rescales the continuous action space of the environment to a range [ |
This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes the observation to the shape given by the tuple |
Transforms environments from old step API to new and vice-versa. Old env.step returns one boolean |
Transforms vector environments from new step API to old. Old env.step returns one boolean vector |
Augment the observation with current time step in the trajectory (by appending it to the observation). This can be useful to ensure that things stay Markov. Currently it only works with one-dimensional observation spaces. |
Probably the most useful wrapper in Gym. This wrapper will emit a done signal if the speciefied number of steps is exceeded in an episode. In order to be able to distinguish termination and truncation, you need to check |
This wrapper will apply |
This wrapper will apply |