Lightning Migration #837

karl-richter · 2022-10-17T17:01:46Z

v0 Todos

Change TimeNet parent from PyTorch Module to PyTorch Lightning
Add PyTorch Lightning functions to TimeNet (eg. training_step)
Add Lightning Trainer in forecaster
Change train and predict functions in forecaster to use the Lightning Trainer
Change the prediction logic in forecaster.py
Handle minimal mode, eg. _train_minimal(...)
Return metrics in _train(...)
Define epochs, batch, lr (etc.) correctly in trainer (when not provided) + move to fit() method

v1 Todos

v2 Todos (separate PRs)

Continue training mode [lightning] Add support to continue training #828
GPU support [lightning] GPU support #938

Changes

Guiding idea: Migrate from using plain PyTorch to the PyTorch Lightning framework.

Consequent changes:

Training: Use the Lightning logic for training, evaluation and prediction (removes most training logic from forecaster.py and moves it to time_net.py) Docs
Metrics: Since the intra-epoch data is not available outside of the Lightning module, the metrics need to be calculated within the model. Instead of using the custom Metrics module, we switch to the (Lightning) library torchmetrics to calculate metrics within the training_step Docs
Metrics Logger: Since the Lightning default logger does not persist metrics but we want to return a metrics_df from fit(), we need to persist them in an object during runtime. Therefore we define a custom logger for Lightning that collect metrics in a dictionary (the default logger saves and checkpoints the model automatically, which we dont always want)
Progress bar: Since the epoch progress information is no longer availbe within the forecaster, we switch to the Lightning built-in progress bar for logging training progress (use the rich progress bar instead of the custom tqdm progress bar and the default lightning tqdm progress bar due to issues in Jupyter notebook) Docs
Early stopping: Added support for early stopping using the Lightning loss monitor (with that we can close Early Stopping #289)
Learning rate finder: From the custom learning rate finder we switch to the Lightning built-in Learning rate finder Docs

Before

The training logic is contained in forecaster.py, manually iter through epochs and batches. The train_epoch() function directly calls forward() on the TimeNet model. Optimization happens manually in train_epoch().

flowchart LR
    subgraph TimeNet
    forward
    end
    subgraph NeuralProphet
    fit --> _train
    _train --> _train_epoch
    _train_epoch --> forward
    predict --> _predict_raw
    _predict_raw --> forward
    end

After

The training loop is abstracted using the Lightning training logic. Init a Lightning Trainer object that can run the training loop automatically. Calling the fit() method on the Lightning trainer executes the training_step() function of the model using the correct epoch and batch. Optimization and parameter updates happen automatically after the training_step() function. The whole training logic is abstracted away. Lightning provides useful tools such as a progress bar, a learning rate finder, early stopping, GPU support etc.

flowchart LR
    subgraph TimeNet
    configure_optimizers
    training_step --> forward
    predict_step --> forward
    end
    subgraph LightningTrainer
    tune --> configure_optimizers
    fit_[fit] --> training_step
    predict_[predict] --> predict_step
    end
    subgraph NeuralProphet
    fit --> _train
    _train --> tune
    _train --> fit_
     predict --> _predict_raw
     _predict_raw --> predict_
    end

alfonsogarciadecorral · 2022-11-06T17:29:41Z

Hi @karl-richter

I have a really quick comment.

In order to be able to visualize the model architecture we need to do the following change in the training_step method:

instead of

        # Run forward calculation
        predicted = self.forward(inputs, meta_name_tensor)
        # Calculate loss
        loss, reg_loss = self.loss_func(inputs, predicted, targets)
        # Metrics

we need to add:

        # Run forward calculation
        predicted = self.forward(inputs, meta_name_tensor)
        # store predictions in self for later network visualization
        self.train_epoch_prediction = predicted
        # Calculate loss
        loss, reg_loss = self.loss_func(inputs, predicted, targets)
        # Metrics

Also, in the tutorial network_architecture_visualization.ipynb
on the very last cell we need to do the change:

instead of:

fig = make_dot(m.train_epoch_prediction, params=dict(m.model.named_parameters()))
# fig_glob.render(filename='img/fig_glob')
display(fig)

we need:

fig = make_dot(m.model.train_epoch_prediction, params=dict(m.model.named_parameters()))
# fig_glob.render(filename='img/fig_glob')
display(fig)

github-actions · 2022-11-07T17:56:48Z

de87e17

Model Benchmark

Benchmark	Metric	main	current	diff
AirPassengers	MAE_val	85.1099	15.2698	-82.06%	✅
AirPassengers	RMSE_val	108.276	19.4209	-82.06%	✅
AirPassengers	Loss_val	nan	0.00195	0.0%	✅
AirPassengers	RegLoss_val	nan	0	0.0%	✅
AirPassengers	epoch	nan	89	0.0%	✅
AirPassengers	MAE	6.35364	9.82902	54.7%	⚠️
AirPassengers	RMSE	7.68085	11.7005	52.33%	⚠️
AirPassengers	Loss	0.00023	0.00056	140.91%	⚠️
AirPassengers	RegLoss	0	0	0.0%	✅
PeytonManning	MAE_val	0.92518	0.64636	-30.14%	✅
PeytonManning	RMSE_val	1.13074	0.79276	-29.89%	✅
PeytonManning	Loss_val	nan	0.01494	0.0%	✅
PeytonManning	RegLoss_val	nan	0	0.0%	✅
PeytonManning	epoch	nan	37	0.0%	✅
PeytonManning	MAE	0.34839	0.42701	22.57%	⚠️
PeytonManning	RMSE	0.48617	0.57032	17.31%	⚠️
PeytonManning	Loss	0.00464	0.00635	36.95%	⚠️
PeytonManning	RegLoss	0	0	0.0%	✅
YosemiteTemps	MAE_val	1.71173	1.72949	1.04%	✅
YosemiteTemps	RMSE_val	2.2758	2.27386	-0.08%	✅
YosemiteTemps	Loss_val	nan	0.00096	0.0%	✅
YosemiteTemps	RegLoss_val	nan	0	0.0%	✅
YosemiteTemps	epoch	nan	84	0.0%	✅
YosemiteTemps	MAE	1.43672	1.45189	1.06%	✅
YosemiteTemps	RMSE	2.14749	2.16631	0.88%	✅
YosemiteTemps	Loss	0.00064	0.00066	1.81%	✅
YosemiteTemps	RegLoss	0	0	0.0%	✅

Model Training

PeytonManning

YosemiteTemps

AirPassengers

ourownstory

LGTM. Great work!!
All points that we discussed can be addressed in later PRs.

karl-richter added 30 commits September 6, 2022 14:35

initial version pytorch lightning

1b8bfce

first version lightning fit()

32d5b6e

added lightning_logs to gitignore

e5b0e7b

converted test function to lightning

48f81b3

converted predict function to lightning

5390e41

added compute_components support for lightning

0c19ed5

added minimal training support for lightning

15023dd

added epochs to lightning config

ca9ee5e

Merge branch 'main' into lightning

b464fe0

moved trainer config to utils

ed1d935

handle multi-batch predictions

76466af

refactoring

f9430c1

added custom logger for metrics

341d422

renamed metrics logger

8a673b5

added scheduler to lightning

48a6574

fixed uncertainty prediction metrics

9237743

added predict_mode flag in lightning model

f4c2c14

updated torchmetrics imports based on deprecation warnings

3642d3b

add denormalization support for torchmetrics

214f7b2

replace tqdm with RichProgressBar

4b5a223

add rich in requirements.txt

9613998

custom colors in rich progress bar

a95c846

re-added denomralization in metrics

3f96363

refactored metrics in time_net and added docs

f05bb6d

refactored metrics in time_net

e7ab7f0

support arbitrary loggers

3081fb8

changed model saving loading test

f658676

Merge branch 'main' into lightning

2c061aa

support lightning lr finder

110807a

refactored minimal training implementation

37f9a32

karl-richter added the model label Oct 25, 2022

karl-richter mentioned this pull request Oct 27, 2022

Early Stopping #289

Closed

karl-richter added 7 commits October 27, 2022 12:21

early stopping configuration

0f9f06e

Merge branch 'main' into lightning

ed66ed7

removed legacy metrics

77579f6

early stopping configuration

1b17c68

silence tensorboard deprecation warnings

fecf91e

early stopping configuration

09ea0f2

remove epoch plot from metrics

57aad19

karl-richter added the status: needs review PR needs to be reviewed by Reviewer(s) label Nov 3, 2022

added docs for trainer_config

89e84a0

This was referenced Nov 4, 2022

GPU support #420

Closed

[bug-fix] Unsqueeze quantile dimension to match multiplied tensor #771

Closed

karl-richter added 2 commits November 7, 2022 09:51

Merge branch 'main' into lightning

ccbac82

added support for layer visualization

43bdcb9

karl-richter added 6 commits November 7, 2022 10:00

fixed isort

20ac087

fixed isort + flake8

6f9c52f

reduce warning messages

d5c3b28

reduce warning messages

fcb4709

fixed flake8

e097853

fixed flake8

a1db9aa

karl-richter mentioned this pull request Nov 7, 2022

[lightning] GPU support #938

Closed

karl-richter added 2 commits November 7, 2022 16:52

fixed flake8

48a0e3d

Merge branch 'main' into lightning

2dc717e

ourownstory approved these changes Nov 12, 2022

View reviewed changes

ourownstory merged commit de87e17 into main Nov 12, 2022

ourownstory deleted the lightning branch November 12, 2022 02:36

karl-richter mentioned this pull request Feb 17, 2023

Dev: transition to pytorch lightning #343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lightning Migration #837

Lightning Migration #837

karl-richter commented Oct 17, 2022 •

edited

Loading

alfonsogarciadecorral commented Nov 6, 2022

github-actions bot commented Nov 7, 2022 •

edited

Loading

ourownstory left a comment

Lightning Migration #837

Lightning Migration #837

Conversation

karl-richter commented Oct 17, 2022 • edited Loading

v0 Todos

v1 Todos

v2 Todos (separate PRs)

Changes

Consequent changes:

Before

After

alfonsogarciadecorral commented Nov 6, 2022

github-actions bot commented Nov 7, 2022 • edited Loading

Model Benchmark

Model Training

PeytonManning

YosemiteTemps

AirPassengers

ourownstory left a comment

Choose a reason for hiding this comment

karl-richter commented Oct 17, 2022 •

edited

Loading

github-actions bot commented Nov 7, 2022 •

edited

Loading