mlagents.trainers.trainer.on_policy_trainer
- OnPolicyTrainer
  - __init__
  - add_policy
mlagents.trainers.trainer.off_policy_trainer
- OffPolicyTrainer
mlagents.trainers.trainer.rl_trainer
- RLTrainer
mlagents.trainers.trainer.trainer
- Trainer
mlagents.trainers.settings

mlagents.trainers.trainer.on_policy_trainer

OnPolicyTrainer Objects

python

class OnPolicyTrainer(RLTrainer)

The PPOTrainer is an implementation of the PPO algorithm.

init

python

 | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)

Responsible for collecting experiences and training an on-policy model.

Arguments:

behavior_name: The name of the behavior associated with trainer config
reward_buff_cap: Max reward history to track in the reward buffer
trainer_settings: The parameters for the trainer.
training: Whether the trainer is set for training.
load: Whether the model should be loaded.
seed: The seed the model will be initialized with
artifact_path: The directory within which to store artifacts from this trainer.

add_policy

python

 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

Arguments:

parsed_behavior_id: Behavior identifiers that the policy should belong to.
policy: Policy to associate with name_behavior_id.

mlagents.trainers.trainer.off_policy_trainer

OffPolicyTrainer Objects

python

class OffPolicyTrainer(RLTrainer)

The SACTrainer is an implementation of the SAC algorithm, with support for discrete actions and recurrent networks.

init

python

 | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)

Responsible for collecting experiences and training an off-policy model.

Arguments:

behavior_name: The name of the behavior associated with trainer config
reward_buff_cap: Max reward history to track in the reward buffer
trainer_settings: The parameters for the trainer.
training: Whether the trainer is set for training.
load: Whether the model should be loaded.
seed: The seed the model will be initialized with
artifact_path: The directory within which to store artifacts from this trainer.

save_model

python

 | save_model() -> None

Saves the final training model to memory Overrides the default to save the replay buffer.

save_replay_buffer

python

 | save_replay_buffer() -> None

Save the training buffer's update buffer to a pickle file.

load_replay_buffer

python

 | load_replay_buffer() -> None

Loads the last saved replay buffer from a file.

add_policy

python

 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

mlagents.trainers.trainer.rl_trainer

RLTrainer Objects

python

class RLTrainer(Trainer)

This class is the base class for trainers that use Reward Signals.

end_episode

python

 | end_episode() -> None

A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

create_optimizer

python

 | @abc.abstractmethod
 | create_optimizer() -> TorchOptimizer

Creates an Optimizer object

save_model

python

 | save_model() -> None

Saves the policy associated with this trainer.

advance

python

 | advance() -> None

Steps the trainer, taking in trajectories and updates if ready. Will block and wait briefly if there are no trajectories.

mlagents.trainers.trainer.trainer

Trainer Objects

python

class Trainer(abc.ABC)

This class is the base class for the mlagents_envs.trainers

init

python

 | __init__(brain_name: str, trainer_settings: TrainerSettings, training: bool, load: bool, artifact_path: str, reward_buff_cap: int = 1)

Responsible for collecting experiences and training a neural network model.

Arguments:

brain_name: Brain name of brain to be trained.
trainer_settings: The parameters for the trainer (dictionary).
training: Whether the trainer is set for training.
artifact_path: The directory within which to store artifacts from this trainer
reward_buff_cap:

stats_reporter

python

 | @property
 | stats_reporter()

Returns the stats reporter associated with this Trainer.

parameters

python

 | @property
 | parameters() -> TrainerSettings

Returns the trainer parameters of the trainer.

get_max_steps

python

 | @property
 | get_max_steps() -> int

Returns the maximum number of steps. Is used to know when the trainer should be stopped.

Returns:

The maximum number of steps of the trainer

get_step

python

 | @property
 | get_step() -> int

Returns the number of steps the trainer has performed

Returns:

the step count of the trainer

threaded

python

 | @property
 | threaded() -> bool

Whether or not to run the trainer in a thread. True allows the trainer to update the policy while the environment is taking steps. Set to False to enforce strict on-policy updates (i.e. don't update the policy when taking steps.)

should_still_train

python

 | @property
 | should_still_train() -> bool

Returns whether or not the trainer should train. A Trainer could stop training if it wasn't training to begin with, or if max_steps is reached.

reward_buffer

python

 | @property
 | reward_buffer() -> Deque[float]

Returns the reward buffer. The reward buffer contains the cumulative rewards of the most recent episodes completed by agents using this trainer.

Returns:

the reward buffer.

save_model

python

 | @abc.abstractmethod
 | save_model() -> None

Saves model file(s) for the policy or policies associated with this trainer.

end_episode

python

 | @abc.abstractmethod
 | end_episode()

A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

create_policy

python

 | @abc.abstractmethod
 | create_policy(parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec) -> Policy

Creates a Policy object

add_policy

python

 | @abc.abstractmethod
 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

get_policy

python

 | get_policy(name_behavior_id: str) -> Policy

Gets policy associated with name_behavior_id

Arguments:

name_behavior_id: Fully qualified behavior name

Returns:

Policy associated with name_behavior_id

advance

python

 | @abc.abstractmethod
 | advance() -> None

Advances the trainer. Typically, this means grabbing trajectories from all subscribed trajectory queues (self.trajectory_queues), and updating a policy using the steps in them, and if needed pushing a new policy onto the right policy queues (self.policy_queues).

publish_policy_queue

python

 | publish_policy_queue(policy_queue: AgentManagerQueue[Policy]) -> None

Adds a policy queue to the list of queues to publish to when this Trainer makes a policy update

Arguments:

policy_queue: Policy queue to publish to.

subscribe_trajectory_queue

python

 | subscribe_trajectory_queue(trajectory_queue: AgentManagerQueue[Trajectory]) -> None

Adds a trajectory queue to the list of queues for the trainer to ingest Trajectories from.

Arguments:

trajectory_queue: Trajectory queue to read from.

mlagents.trainers.settings

deep_update_dict

python

deep_update_dict(d: Dict, update_d: Mapping) -> None

Similar to dict.update(), but works for nested dicts of dicts as well.

RewardSignalSettings Objects

python

@attr.s(auto_attribs=True)
class RewardSignalSettings()

structure

python

 | @staticmethod
 | structure(d: Mapping, t: type) -> Any

Helper method to structure a Dict of RewardSignalSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of RewardSignalSettings classes.

ParameterRandomizationSettings Objects

python

@attr.s(auto_attribs=True)
class ParameterRandomizationSettings(abc.ABC)

str

python

 | __str__() -> str

Helper method to output sampler stats to console.

structure

python

 | @staticmethod
 | structure(d: Union[Mapping, float], t: type) -> "ParameterRandomizationSettings"

Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of ParameterRandomizationSettings classes.

unstructure

python

 | @staticmethod
 | unstructure(d: "ParameterRandomizationSettings") -> Mapping

Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_unstructure_hook() and called with cattr.unstructure().

apply

python

 | @abc.abstractmethod
 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the appropriate sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

ConstantSettings Objects

python

@attr.s(auto_attribs=True)
class ConstantSettings(ParameterRandomizationSettings)

str

python

 | __str__() -> str

Helper method to output sampler stats to console.

apply

python

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the constant sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

UniformSettings Objects

python

@attr.s(auto_attribs=True)
class UniformSettings(ParameterRandomizationSettings)

str

python

 | __str__() -> str

Helper method to output sampler stats to console.

apply

python

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the uniform sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

GaussianSettings Objects

python

@attr.s(auto_attribs=True)
class GaussianSettings(ParameterRandomizationSettings)

str

python

 | __str__() -> str

Helper method to output sampler stats to console.

apply

python

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the gaussian sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

MultiRangeUniformSettings Objects

python

@attr.s(auto_attribs=True)
class MultiRangeUniformSettings(ParameterRandomizationSettings)

str

python

 | __str__() -> str

Helper method to output sampler stats to console.

apply

python

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the multirangeuniform sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

CompletionCriteriaSettings Objects

python

@attr.s(auto_attribs=True)
class CompletionCriteriaSettings()

CompletionCriteriaSettings contains the information needed to figure out if the next lesson must start.

need_increment

python

 | need_increment(progress: float, reward_buffer: List[float], smoothing: float) -> Tuple[bool, float]

Given measures, this method returns a boolean indicating if the lesson needs to change now, and a float corresponding to the new smoothed value.

Lesson Objects

python

@attr.s(auto_attribs=True)
class Lesson()

Gathers the data of one lesson for one environment parameter including its name, the condition that must be fulfilled for the lesson to be completed and a sampler for the environment parameter. If the completion_criteria is None, then this is the last lesson in the curriculum.

EnvironmentParameterSettings Objects

python

@attr.s(auto_attribs=True)
class EnvironmentParameterSettings()

EnvironmentParameterSettings is an ordered list of lessons for one environment parameter.

structure

python

 | @staticmethod
 | structure(d: Mapping, t: type) -> Dict[str, "EnvironmentParameterSettings"]

Helper method to structure a Dict of EnvironmentParameterSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

TrainerSettings Objects

python

@attr.s(auto_attribs=True)
class TrainerSettings(ExportableSettings)

structure

python

 | @staticmethod
 | structure(d: Mapping, t: type) -> Any

Helper method to structure a TrainerSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

CheckpointSettings Objects

python

@attr.s(auto_attribs=True)
class CheckpointSettings()

prioritize_resume_init

python

 | prioritize_resume_init() -> None

Prioritize explicit command line resume/init over conflicting yaml options. if both resume/init are set at one place use resume

RunOptions Objects

python

@attr.s(auto_attribs=True)
class RunOptions(ExportableSettings)

from_argparse

python

 | @staticmethod
 | from_argparse(args: argparse.Namespace) -> "RunOptions"

Takes an argparse.Namespace as specified in parse_command_line, loads input configuration files from file paths, and converts to a RunOptions instance.

Arguments:

args: collection of command-line parameters passed to mlagents-learn

Returns:

RunOptions representing the passed in arguments, with trainer config, curriculum and sampler configs loaded from files.

Table of Contents

Table of Contents

mlagents.trainers.trainer.on_policy_trainer

OnPolicyTrainer Objects

__init__

add_policy

mlagents.trainers.trainer.off_policy_trainer

OffPolicyTrainer Objects

__init__

save_model

save_replay_buffer

load_replay_buffer

add_policy

mlagents.trainers.trainer.rl_trainer

RLTrainer Objects

end_episode

create_optimizer

save_model

advance

mlagents.trainers.trainer.trainer

Trainer Objects

__init__

stats_reporter

parameters

get_max_steps

get_step

threaded

should_still_train

reward_buffer

save_model

end_episode

create_policy

add_policy

get_policy

advance

publish_policy_queue

subscribe_trajectory_queue

mlagents.trainers.settings

deep_update_dict

RewardSignalSettings Objects

structure

ParameterRandomizationSettings Objects

__str__

structure

unstructure

apply

ConstantSettings Objects

__str__

apply

UniformSettings Objects

__str__

apply

GaussianSettings Objects

__str__

apply

MultiRangeUniformSettings Objects

__str__

apply

CompletionCriteriaSettings Objects

need_increment

Lesson Objects

EnvironmentParameterSettings Objects

structure

TrainerSettings Objects

structure

CheckpointSettings Objects

prioritize_resume_init

RunOptions Objects

from_argparse

init

init

init

str

str

str

str

str