This repository contains boilerplate for PyTorch's training loop.
Some features included in this boilerplater are:
- Automatically save checkpoints of model state, optimizer state and training scores every x epochs
- Load checkpoints
- If the gradients are computed as
float16
, the value of gradients may underflow to 0. - Gradients are scaled up after backpropagation and scaled down before optimizer step to solve this
- The outputs and loss of the model are lowered in precision for less memory usage.
- Every batch, the cache is cleared and non-used objects are flushed from memory.