Fix CPU bug, overhaul model runner, and update to lightning >=2.0 #176

wfondrie · 2023-04-22T07:29:52Z

This is a huge PR that:

Hopefully fixes CPU-related bugs (Integration test fails on no-GPU machine #106). It works locally for me, even with MPS disabled.
Removes the no_gpu parameter, in favor of providing accelerator and devices parameters to allow users to select custom devices readily without negating CUDA_VISIBLE_DEVICES. I think this is the best comprimise for Add num_gpus config parameter #173.
Completely refactors model_runner.py into a class, ModelRunner. I found it annoying to have to change arguments for models and such in multiple spots, so I think this change will make it much more maintainable.
Makes a bunch of minor tweaks required for Pytorch-Lighting 2.0+. One of those is changing how metric logging and PSM recording is done. PSM writing is now done using the on_predict_batch_end rather than on_predict_epoch_end because the latter seems to no longer receive the predict results. The newer version of Lightning doesn't allow for dictionary metrics to be logged in the way we were doing before, so please pay attention to the changes in review.
The integration test was overhauled to make it useful. It now trains a tiny model, evaluates it on a tiny file, and predicts the sequences for the same file.
Limits prediction to using 1 device.

I'm tagging both @bittremieux and @melihyilmaz to review this one since this one is so big and I don't want to mess something up 🙈.

codecov · 2023-04-22T07:41:19Z

Codecov Report

Merging #176 (8208b93) into main (d023fa9) will increase coverage by 8.53%.
The diff coverage is 86.90%.

@@            Coverage Diff             @@
##             main     #176      +/-   ##
==========================================
+ Coverage   79.10%   87.63%   +8.53%     
==========================================
  Files          11       12       +1     
  Lines         804      833      +29     
==========================================
+ Hits          636      730      +94     
+ Misses        168      103      -65

Impacted Files	Coverage Δ
casanovo/config.py	`86.84% <ø> (+1.47%)`	⬆️
casanovo/denovo/model.py	`95.80% <82.60%> (+16.79%)`	⬆️
casanovo/denovo/dataloaders.py	`98.18% <85.71%> (+12.99%)`	⬆️
casanovo/denovo/model_runner.py	`87.59% <85.83%> (+38.51%)`	⬆️
casanovo/casanovo.py	`63.63% <100.00%> (-27.35%)`	⬇️
casanovo/data/ms_io.py	`96.66% <100.00%> (+0.11%)`	⬆️
casanovo/denovo/__init__.py	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

wfondrie · 2023-04-23T04:11:31Z

All tests passing on Linux, Windows, and MacOS CPU runners! 🎉

bittremieux

Very nice changes, I just have a few minor comments.

casanovo/casanovo.py

casanovo/config.yaml

casanovo/denovo/dataloaders.py

casanovo/denovo/model.py

tests/test_integration.py

casanovo/denovo/model_runner.py

wfondrie · 2023-04-28T16:27:44Z

Thanks @bittremieux! Your turn now @melihyilmaz 🚀

melihyilmaz

I stumbled upon one issue when testing locally, which should be fixed with my recent commit, but everything else looks great. @wfondrie Can you take a look at the only failing test - I wasn't sure if we need to tweak test or my recent commit ? Feel free to merge afterwards!

Thanks!

casanovo/denovo/model_runner.py

wfondrie · 2023-05-10T06:05:03Z

Ok, so I think I have something that'll work pretty robustly: the model weights are loaded onto the current PyTorch default device, which is normally CPU. However, this let's us test it by changing the default device to the meta device that @bittremieux mentioned, which I've done here.

Also, I went ahead and included the initialization parameters with the model weights, so that loading weights is independent of the configuration provided --- the model will always match the loaded weights, except for the event of major architecture changes (Issue #156)

wfondrie · 2023-05-10T07:16:09Z

Good thing the integration tests were working, because it helped me catch a bug!

Anyway, I also changed checkpointing behavior to only keep the top 5 checkboints based on validation CE loss, but changed the save_models parameter to save_top_k, allowing us to set it at -1 if we reeeeally want all the checkpoints.

wfondrie · 2023-05-10T07:23:37Z

@melihyilmaz and @bittremieux - any objections to my updates?

bittremieux

@wfondrie updates look good to me.

melihyilmaz

Thanks @wfondrie, I've tested locally and identified only two issues that arose with the recent commits.

casanovo/denovo/model_runner.py

melihyilmaz

No more issues, good to merge!

wfondrie added 3 commits April 22, 2023 00:20

Overhaul runner

5710530

Update linting to only happen once

c5a891b

Fix linting error

c384139

wfondrie added 2 commits April 22, 2023 00:51

Specify utf-8 encoding

2cd4cd6

Specify utf-8 encoding only for default config

1003c16

wfondrie requested review from bittremieux and melihyilmaz April 22, 2023 07:53

wfondrie added 4 commits April 22, 2023 00:59

Skip weights tests for now

7b0c323

Update skipping API test

4ab43e1

Revert accidental max_epochs change

2858a90

msg -> reason for pytest.mark.skip

593b2f6

wfondrie mentioned this pull request Apr 27, 2023

Multiprocessing bug when using CPU only #177

Closed

bittremieux reviewed Apr 28, 2023

View reviewed changes

casanovo/denovo/model_runner.py Show resolved Hide resolved

bittremieux reviewed Apr 28, 2023

View reviewed changes

casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved

Wout's suggestions and more tests

1c11706

Remove encoding

bc48f30

bittremieux approved these changes Apr 28, 2023

View reviewed changes

This was linked to issues May 2, 2023

Integration test fails on no-GPU machine #106

Closed

Add num_gpus config parameter #173

Closed

Multiprocessing bug when using CPU only #177

Closed

melihyilmaz added 2 commits May 7, 2023 18:49

Specify device type when weight loading

842b694

Fix lint

af6abfb

melihyilmaz reviewed May 8, 2023

View reviewed changes

casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved

Capture init params and figure out device automagically

57171a7

Add runner tests

9cd2770

Fix bug and limit saved models

096aeb9

wfondrie linked an issue May 10, 2023 that may be closed by this pull request

Include model instance arguements with weights #156

Closed

bittremieux approved these changes May 10, 2023

View reviewed changes

melihyilmaz reviewed May 10, 2023

View reviewed changes

casanovo/denovo/model_runner.py Outdated Show resolved Hide resolved

casanovo/denovo/model_runner.py Show resolved Hide resolved

wfondrie added 2 commits May 10, 2023 14:14

Support old weights too

fdd7638

Remove every_n_train_steps from checkpoint

8208b93

melihyilmaz approved these changes May 10, 2023

View reviewed changes

melihyilmaz merged commit 6299bd2 into main May 10, 2023

melihyilmaz deleted the cpu-bug branch May 10, 2023 22:11

wfondrie mentioned this pull request Aug 7, 2023

Upgrade to pytorch-lightning 2 #230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CPU bug, overhaul model runner, and update to lightning >=2.0 #176

Fix CPU bug, overhaul model runner, and update to lightning >=2.0 #176

wfondrie commented Apr 22, 2023 •

edited

Loading

codecov bot commented Apr 22, 2023 •

edited

Loading

wfondrie commented Apr 23, 2023

bittremieux left a comment

wfondrie commented Apr 28, 2023

melihyilmaz left a comment

wfondrie commented May 10, 2023

wfondrie commented May 10, 2023

wfondrie commented May 10, 2023

bittremieux left a comment

melihyilmaz left a comment

melihyilmaz left a comment

Fix CPU bug, overhaul model runner, and update to lightning >=2.0 #176

Fix CPU bug, overhaul model runner, and update to lightning >=2.0 #176

Conversation

wfondrie commented Apr 22, 2023 • edited Loading

codecov bot commented Apr 22, 2023 • edited Loading

Codecov Report

wfondrie commented Apr 23, 2023

bittremieux left a comment

Choose a reason for hiding this comment

wfondrie commented Apr 28, 2023

melihyilmaz left a comment

Choose a reason for hiding this comment

wfondrie commented May 10, 2023

wfondrie commented May 10, 2023

wfondrie commented May 10, 2023

bittremieux left a comment

Choose a reason for hiding this comment

melihyilmaz left a comment

Choose a reason for hiding this comment

melihyilmaz left a comment

Choose a reason for hiding this comment

wfondrie commented Apr 22, 2023 •

edited

Loading

codecov bot commented Apr 22, 2023 •

edited

Loading