docs/Python-PettingZoo-API.md
With the increasing interest in multi-agent training with a gym-like API, we provide a
PettingZoo Wrapper around the Petting Zoo API. Our wrapper
provides interfaces on top of our UnityEnvironment class, which is the default way of
interfacing with a Unity environment via Python.
The PettingZoo wrapper is part of the mlagents_envs package. Please refer to the
mlagents_envs installation instructions.
[Colab] PettingZoo Wrapper Example
This colab notebook demonstrates the example usage of the wrapper, including installation, basic usages, and an example with our Striker vs Goalie environment which is a multi-agents environment with multiple different behavior names.
This wrapper is compatible with PettingZoo API. Please check out PettingZoo API page for more details. Here's an example of interacting with wrapped environment:
from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.envs import UnityToPettingZooWrapper
unity_env = UnityEnvironment("StrikersVsGoalie")
env = UnityToPettingZooWrapper(unity_env)
env.reset()
for agent in env.agent_iter():
observation, reward, done, info = env.last()
action = policy(observation, agent)
env.step(action)
env.step(action),
the PZ wrapper will store the action, and will only perform environment stepping when all the
agents requesting for actions in the current step have been assigned an action. This is for
performance, considering that the communication between Unity and python is more efficient
when data are sent in batches.env.reward should return the instantaneous reward for that particular step, but the true
reward would only be available when an actual environment step is performed. It's recommended that
you follow the API definition for training (access rewards from env.last() instead of
env.reward) and the underlying mechanism shouldn't affect training results.env.agent_iter(max_step) will
keep going on until the specified max step is reached (default: 2**63). There is no need to
call env.reset() except for the very beginning of instantiating an environment.