labml_nn/rl/ppo/experiment.ipynb
This is an experiment training an agent to play Atari Breakout game using Proximal Policy Optimization - PPO
Install the labml-nn package
!pip install labml-nn
Add Atari ROMs (Doesn't work without this in Google Colab)
! wget http://www.atarimania.com/roms/Roms.rar
! mkdir /content/ROM/
! unrar e /content/Roms.rar /content/ROM/
! python -m atari_py.import_roms /content/ROM/
Imports
from labml import experiment
from labml.configs import FloatDynamicHyperParam, IntDynamicHyperParam
from labml_nn.rl.ppo.experiment import Trainer
Create an experiment
experiment.create(name="ppo")
IntDynamicHyperParam and FloatDynamicHyperParam are dynamic hyper parameters
that you can change while the experiment is running.
configs = {
# number of updates
'updates': 10000,
# number of epochs to train the model with sampled data
'epochs': IntDynamicHyperParam(8),
# number of worker processes
'n_workers': 8,
# number of steps to run on each process for a single update
'worker_steps': 128,
# number of mini batches
'batches': 4,
# Value loss coefficient
'value_loss_coef': FloatDynamicHyperParam(0.5),
# Entropy bonus coefficient
'entropy_bonus_coef': FloatDynamicHyperParam(0.01),
# Clip range
'clip_range': FloatDynamicHyperParam(0.1),
# Learning rate
'learning_rate': FloatDynamicHyperParam(2.5e-4, (0, 1e-3)),
}
Set experiment configurations
experiment.configs(configs)
Create trainer
trainer = Trainer(**configs)
Start the experiment and run the training loop.
with experiment.start():
trainer.run_training_loop()