To go through the main project structure in place, let's train an MLP on MNIST data.
We will cover:
- Project structure
- A simple PyTorch MLP model
- PyTorch-Lightning based training
- A single point of entry to running experiments:
python3 run_experiment.py
Please complete Lab Setup before proceeding!
Then, in the fsdl-text-recognizer-2021-labs
repo, let's pull the latest changes, and enter the lab1 directory.
git pull
cd lab1/
In our labs, we will be building a code base incrementally. Each week, a new lab will be released, showing more of the codebase.
Today, we start with the bare minimum:
(fsdl-text-recognizer-2021) ➜ lab1 git:(main) ✗ tree -I "logs|admin|wandb|__pycache__"
.
├── readme.md
├── text_recognizer
│ ├── data
│ │ ├── base_data_module.py
│ │ ├── __init__.py
│ │ ├── mnist.py
│ │ └── util.py
│ ├── __init__.py
│ ├── lit_models
│ │ ├── base.py
│ │ └── __init__.py
│ ├── models
│ │ ├── __init__.py
│ │ └── mlp.py
│ └── util.py
└── training
├── __init__.py
└── run_experiment.py
We can see that the main breakdown of the codebase is between text_recognizer
and training
.
The former, text_recognizer
, should be thought of as a Python package that we are developing and will eventually deploy in some way.
The latter, training
, should be thought of as support code for developing text_recognizer
, which currently consists simply of run_experiment.py
.
Within text_recognizer
, there is further breakdown between data
, models
, and lit_models
.
Let's go through them in sequence.
There are three scopes of our code dealing with data, with slightly overlapping names: DataModule
, DataLoader
, and Dataset
.
At the top level are DataModule
classes, which are responsible for quite a few things:
- Downloading raw data and/or generating synthetic data
- Processing data as needed to get it ready to go through PyTorch models
- Splitting data into train/val/test sets
- Specifying dimensions of the inputs (e.g.
(C, H, W) float tensor
- Specifying information about the targets (e.g. a class mapping)
- Specifying data augmentation transforms to apply in training
In the process of doing the above, DataModule
s make use of a couple of other classes:
- They wrap underlying data in a
torch Dataset
, which returns individual (and optionally, transformed) data instances. - They wrap the
torch Dataset
in atorch DataLoader
, which samples batches, shuffles their order, and delivers them to the GPU.
If need be, you can read more about these PyTorch data interfaces.
To avoid writing same old boilerplate for all of our data sources, we define a simple base class text_recognizer.data.BaseDataModule
which in turn inherits from pl.LightningDataModule
.
This inheritance will let us use the data very simply with PyTorch-Lightning Trainer
and avoid common problems with distributed training.
Models are what is commonly known as "neural nets": code that accepts an input, processes it through layers of computations, and produces an output.
Most importantly, the code is partially written (the architecture of the neural net), and partially learned (the parameters, or weights, of all the layers in the architecture). Therefore, the computation of the model must be back-propagatable.
Since we are using PyTorch, all of our models sublcass torch.nn.Module
, which makes them learnable in this way.
We use PyTorch-Lightning for training, which defines the LightningModule
interface that handles not only everything that a Model (as defined above) handles, but also specifies the details of the learning algorithm: what loss should be computed from the output of the model and the ground truth, which optimizer should be used, with what learning rate, etc.
Now we understand enough to train.
Our training/run_experiment.py
is a script that handles many command-line parameters.
Here's a command we can run:
python3 training/run_experiment.py --model_class=MLP --data_class=MNIST --max_epochs=5 --gpus=1
While model_class
and data_class
are our own arguments, max_epochs
and gpus
are arguments automatically picked up from pytorch_lightning.Trainer
.
You can use any other Trainer
flag (see docs) on the command line, for example --batch_size=512
.
The run_experiment.py
script also picks up command-line flags from the model and data classes that are specified.
For example, in text_recognizer/models/mlp.py
we specify the MLP
class, and add a couple of command-line flags: --fc1
and --fc2
.
Accordingly, we can run
python3 training/run_experiment.py --model_class=MLP --data_class=MNIST --max_epochs=5 --gpus=1 --fc1=4 --fc2=8
And watch the model fail to achieve high accuracy due to too few parameters :)
- Try
training/run_experiment.py
with different MLP hyper-parameters (e.g.--fc1=128 --fc2=64
). - Try editing the MLP architecture in
text_recognizers/models/mlp.py