docs/tutorials/training.md
From the previous tutorials, you may now have a custom model and a data loader. To run training, users typically have a preference in one of the following two styles:
With a model and a data loader ready, everything else needed to write a training loop can be found in PyTorch, and you are free to write the training loop yourself. This style allows researchers to manage the entire training logic more clearly and have full control. One such example is provided in tools/plain_train_net.py.
Any customization on the training logic is then easily controlled by the user.
We also provide a standardized "trainer" abstraction with a hook system that helps simplify the standard training behavior. It includes the following two instantiations:
SimpleTrainer initialized from a
yacs config, used by
tools/train_net.py and many scripts.
It includes more standard default behaviors that one might want to opt in,
including default configurations for optimizer, learning rate schedule,
logging, evaluation, checkpointing etc.To customize a DefaultTrainer:
For simple customizations (e.g. change optimizer, evaluator, LR scheduler, data loader, etc.), overwrite its methods in a subclass, just like tools/train_net.py.
For extra tasks during training, check the hook system to see if it's supported.
As an example, to print hello during training:
class HelloHook(HookBase):
def after_step(self):
if self.trainer.iter % 100 == 0:
print(f"Hello at iteration {self.trainer.iter}!")
Using a trainer+hook system means there will always be some non-standard behaviors that cannot be supported, especially in research. For this reason, we intentionally keep the trainer & hook system minimal, rather than powerful. If anything cannot be achieved by such a system, it's easier to start from tools/plain_train_net.py to implement custom training logic manually.
During training, detectron2 models and trainer put metrics to a centralized EventStorage. You can use the following code to access it and log metrics to it:
from detectron2.utils.events import get_event_storage
# inside the model:
if self.training:
value = # compute the value from inputs
storage = get_event_storage()
storage.put_scalar("some_accuracy", value)
Refer to its documentation for more details.
Metrics are then written to various destinations with EventWriter.
DefaultTrainer enables a few EventWriter with default configurations.
See above for how to customize them.