Automatically download model weights #68

bittremieux · 2022-08-23T23:40:34Z

Fixes #17. Fixes #75.

Use cached model weights or download them from GitHub.

If no weights file (extension: .ckpt) is available in the cache directory, it will be downloaded from a release asset on GitHub. Model weights are retrieved by matching release version. If no model weights for an identical release (major, minor, patch), alternative releases with matching (i) major and minor, or (ii) major versions will be used. If no matching release can be found, no model weights will be downloaded.

Note that the GitHub API is limited to 60 requests from the same IP per hour. A log message provides instructions to explicitly specify the model file for subsequent uses.

Review: @wsnoble to check whether this is the desired behavior for the users and the documentation, @wfondrie to check the code.

We have this information on the Releases page now.

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

This simplifies the examples that most users will want to use.

wfondrie

This looks like it solves the problem to me. Its a bit complicated and I think it'd be worthwhile to create unit tests for it, perhaps by mocking a response GitHub.

Also, do we have a way to verify that the architecture (number of layers, layer dim, etc) matches the downloaded weights, or to automatically set it?

casanovo/casanovo.py

Simplify config

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

I.e. the config YAML file.

wfondrie · 2022-10-11T19:03:10Z

It looks like ~~checking the version is failing in GH Actions~~ regex processing the version is not working 🤔

bittremieux · 2022-10-11T20:53:34Z

I see you removed the psutil dependency. This is necessary to correctly set resources on a shared system, such as the cluster. os.cpu_count() will give the number of CPUs that are present, psutil instead will give the number CPUs that can be used. Thus, if you use os.cpu_count() but you requested only a specific number of CPUs, this will make sure that there aren't an excessive number of worker threads (extreme case: on n032 use 40 workers threads when the job only has 1 CPU).

codecov · 2022-10-11T23:08:59Z

Codecov Report

Merging #68 (aec590a) into main (a5c05b8) will increase coverage by 54.01%.
The diff coverage is 94.38%.

@@             Coverage Diff             @@
##             main      #68       +/-   ##
===========================================
+ Coverage   14.31%   68.32%   +54.01%     
===========================================
  Files           9       10        +1     
  Lines         559      644       +85     
===========================================
+ Hits           80      440      +360     
+ Misses        479      204      -275

Impacted Files	Coverage Δ
casanovo/denovo/dataloaders.py	`85.18% <ø> (+85.18%)`	⬆️
casanovo/denovo/model.py	`61.15% <ø> (+24.46%)`	⬆️
casanovo/denovo/model_runner.py	`49.53% <85.71%> (+49.53%)`	⬆️
casanovo/casanovo.py	`89.13% <94.82%> (+89.13%)`	⬆️
casanovo/utils.py	`100.00% <100.00%> (ø)`
casanovo/data/ms_io.py	`96.15% <0.00%> (+75.00%)`	⬆️
casanovo/data/datasets.py	`82.75% <0.00%> (+82.75%)`	⬆️
... and 1 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

wfondrie · 2022-10-11T23:12:48Z

Now we're getting a multiprocessing error on Windows 🤦‍♂️

It crashes the DataLoader: pytorch/pytorch#70344

- Non-matching version - GitHub rate limit exceeded

* Download model weights from GitHub release * Include dependencies * Update model usage documentation * Reformat with black * Download weights to the OS-specific app dir * Don't download weights if already in cache dir * Update model file instructions * Remove release notes from the README We have this information on the Releases page now. * Remove explicit model specification from example commands * Harmonize default parameters and config values As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279). * No need to specify config file by default This simplifies the examples that most users will want to use. * Simplify version matching regex * Remove depthcharge related tests The transformer tests only deal with depthcharge functionality and just seem copied from its repository. * Make sure that package data is included I.e. the config YAML file. * Remove obsolote (ppx) tests * Update integration test * Add MacOS support and support for Apple's MPS chips * Fail test but print version * Added n_worker fn and tests * Create split_version fn and add unit tests * Fix debugging unit test * Explicitly set version * Monkeypatch loaded version * Add device selector, so that on CPU-only runs the devices > 0 * Add windows patch * Fix typo * Revert * Use main process for data loading on Windows * Fix typo * Fix unit test * Fix devices for when num_workers == 0 * Fix devices for when num_workers == 0 * Minor README updates * Import reordering * Minor code and docstring reformatting * Test model weights retrieval * Fix getting the number of devices * Disable excessive Tensorboard deprecation warnings * Don't use worker threads on MacOS It crashes the DataLoader: pytorch/pytorch#70344 * Warnings need to be ignored before import * Additional weights tests - Non-matching version - GitHub rate limit exceeded * Disable tests on MacOS * Include Python 3.10 as supported version Co-authored-by: William Fondrie <[email protected]> Co-authored-by: Wout Bittremieux <[email protected]> Co-authored-by: William Fondrie <[email protected]>

* Add beam search * Delete print statements * Automatically download model weights (#68) (#88) * Download model weights from GitHub release * Include dependencies * Update model usage documentation * Reformat with black * Download weights to the OS-specific app dir * Don't download weights if already in cache dir * Update model file instructions * Remove release notes from the README We have this information on the Releases page now. * Remove explicit model specification from example commands * Harmonize default parameters and config values As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279). * No need to specify config file by default This simplifies the examples that most users will want to use. * Simplify version matching regex * Remove depthcharge related tests The transformer tests only deal with depthcharge functionality and just seem copied from its repository. * Make sure that package data is included I.e. the config YAML file. * Remove obsolote (ppx) tests * Update integration test * Add MacOS support and support for Apple's MPS chips * Fail test but print version * Added n_worker fn and tests * Create split_version fn and add unit tests * Fix debugging unit test * Explicitly set version * Monkeypatch loaded version * Add device selector, so that on CPU-only runs the devices > 0 * Add windows patch * Fix typo * Revert * Use main process for data loading on Windows * Fix typo * Fix unit test * Fix devices for when num_workers == 0 * Fix devices for when num_workers == 0 * Minor README updates * Import reordering * Minor code and docstring reformatting * Test model weights retrieval * Fix getting the number of devices * Disable excessive Tensorboard deprecation warnings * Don't use worker threads on MacOS It crashes the DataLoader: pytorch/pytorch#70344 * Warnings need to be ignored before import * Additional weights tests - Non-matching version - GitHub rate limit exceeded * Disable tests on MacOS * Include Python 3.10 as supported version Co-authored-by: William Fondrie <[email protected]> Co-authored-by: Wout Bittremieux <[email protected]> Co-authored-by: William Fondrie <[email protected]> * Automatically download model weights (#68) (#89) * Download model weights from GitHub release * Include dependencies * Update model usage documentation * Reformat with black * Download weights to the OS-specific app dir * Don't download weights if already in cache dir * Update model file instructions * Remove release notes from the README We have this information on the Releases page now. * Remove explicit model specification from example commands * Harmonize default parameters and config values As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279). * No need to specify config file by default This simplifies the examples that most users will want to use. * Simplify version matching regex * Remove depthcharge related tests The transformer tests only deal with depthcharge functionality and just seem copied from its repository. * Make sure that package data is included I.e. the config YAML file. * Remove obsolote (ppx) tests * Update integration test * Add MacOS support and support for Apple's MPS chips * Fail test but print version * Added n_worker fn and tests * Create split_version fn and add unit tests * Fix debugging unit test * Explicitly set version * Monkeypatch loaded version * Add device selector, so that on CPU-only runs the devices > 0 * Add windows patch * Fix typo * Revert * Use main process for data loading on Windows * Fix typo * Fix unit test * Fix devices for when num_workers == 0 * Fix devices for when num_workers == 0 * Minor README updates * Import reordering * Minor code and docstring reformatting * Test model weights retrieval * Fix getting the number of devices * Disable excessive Tensorboard deprecation warnings * Don't use worker threads on MacOS It crashes the DataLoader: pytorch/pytorch#70344 * Warnings need to be ignored before import * Additional weights tests - Non-matching version - GitHub rate limit exceeded * Disable tests on MacOS * Include Python 3.10 as supported version Co-authored-by: William Fondrie <[email protected]> Co-authored-by: Wout Bittremieux <[email protected]> Co-authored-by: William Fondrie <[email protected]> * Break beam search to testable subfunctions * Fix precursor m/z termination and filtering * Add unit testing for beam search * Add beamsearch comments and fix formatting * Address requested changes and minor fixes * Add more unit tests for beam search * Check NH3 loss for early stopping * Consistent parameter order * Update docstrings * Remove unused precursors parameter * Update beam matching mask in a level higher * Minor refactoring to avoid code duplication * Update imports * Simplification refactoring * Fix unit tests * Simplify predicted peptide caching * Simplify predicted peptide caching * Simplify predicted peptide caching * Unify predicted peptide caching * Restrict tensor reshape to subfunction and minor fixes * Finish beams when all isotopes exceed the precursor m/z tolerance * Generalize look-ahead for tokens with negative mass * Remove greedy decoding functionality * Handle case with unfinished beams and add test * Upgrade required depthcharge version * Use detokenize function * Add test for negative mass-aware termination * Fix egative mass-aware beam termination * Minor refactoring * Add test for dummy output at max length * Fixed and refactored peptide and scocre mzTab outputs * Add tests for peptide and score output formatting * Small fixes * Update changelog * Fix changelog update Co-authored-by: Wout Bittremieux <[email protected]> Co-authored-by: William Fondrie <[email protected]> Co-authored-by: Wout Bittremieux <[email protected]>

bittremieux added 4 commits August 23, 2022 16:08

Download model weights from GitHub release

21e7487

Include dependencies

a6467ee

Update model usage documentation

2949861

Reformat with black

7408805

bittremieux requested review from wfondrie and wsnoble August 23, 2022 23:40

Base automatically changed from fix to main August 24, 2022 03:01

bittremieux added 7 commits August 24, 2022 09:32

Download weights to the OS-specific app dir

1c7b1bd

Don't download weights if already in cache dir

325d050

Update model file instructions

2880967

Remove release notes from the README

d5e0244

We have this information on the Releases page now.

Remove explicit model specification from example commands

6d2aa38

Harmonize default parameters and config values

1692936

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

No need to specify config file by default

a14f785

This simplifies the examples that most users will want to use.

bittremieux removed the request for review from wsnoble August 25, 2022 19:52

wfondrie requested changes Aug 27, 2022

View reviewed changes

casanovo/casanovo.py Outdated Show resolved Hide resolved

bittremieux and others added 7 commits August 27, 2022 16:26

Merge pull request #69 from Noble-Lab/config

84ea01a

Simplify config

Simplify version matching regex

1688d68

Remove depthcharge related tests

904c7fd

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

Make sure that package data is included

96e8c24

I.e. the config YAML file.

Merge remote-tracking branch 'origin/main' into weights

681986f

Remove obsolote (ppx) tests

965a04a

Update integration test

29a0c36

bittremieux mentioned this pull request Oct 3, 2022

Update for PyPI release #79

Merged

wfondrie added a commit that referenced this pull request Oct 3, 2022

Migrate tests from #68

1f458fd

Resolve merge conflicts

745b0ce

wfondrie added 2 commits October 11, 2022 13:49

Add MacOS support and support for Apple's MPS chips

aa7f47f

Fail test but print version

5292da9

Add device selector, so that on CPU-only runs the devices > 0

ea02de0

wfondrie added 8 commits October 25, 2022 16:12

Add windows patch

9e03cc9

Fix typo

b055e6d

Revert

a3645fd

Use main process for data loading on Windows

2bb3a55

Fix typo

683ebbb

Fix unit test

c275127

Fix devices for when num_workers == 0

7057600

Fix devices for when num_workers == 0

58e4ce1

wfondrie approved these changes Oct 26, 2022

View reviewed changes

bittremieux added 12 commits November 2, 2022 10:40

Minor README updates

7115c2d

Import reordering

22ce3bf

Minor code and docstring reformatting

8f00696

Test model weights retrieval

af407fa

Merge remote-tracking branch 'origin/main' into weights

98f242d

Fix getting the number of devices

767acd4

Disable excessive Tensorboard deprecation warnings

1e8c655

Don't use worker threads on MacOS

e922b76

It crashes the DataLoader: pytorch/pytorch#70344

Warnings need to be ignored before import

b7188d6

Additional weights tests

e7f7df6

- Non-matching version - GitHub rate limit exceeded

Disable tests on MacOS

d6fc99b

Include Python 3.10 as supported version

aec590a

bittremieux merged commit f3696ca into main Nov 3, 2022

bittremieux deleted the weights branch November 3, 2022 02:59

melihyilmaz added a commit that referenced this pull request Nov 5, 2022

Automatically download model weights (#68)

bab7072

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically download model weights #68

Automatically download model weights #68

bittremieux commented Aug 23, 2022 •

edited

Loading

wfondrie left a comment

wfondrie commented Oct 11, 2022 •

edited

Loading

bittremieux commented Oct 11, 2022

codecov bot commented Oct 11, 2022 •

edited

Loading

wfondrie commented Oct 11, 2022

Automatically download model weights #68

Automatically download model weights #68

Conversation

bittremieux commented Aug 23, 2022 • edited Loading

wfondrie left a comment

Choose a reason for hiding this comment

wfondrie commented Oct 11, 2022 • edited Loading

bittremieux commented Oct 11, 2022

codecov bot commented Oct 11, 2022 • edited Loading

Codecov Report

wfondrie commented Oct 11, 2022

bittremieux commented Aug 23, 2022 •

edited

Loading

wfondrie commented Oct 11, 2022 •

edited

Loading

codecov bot commented Oct 11, 2022 •

edited

Loading