Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically download model weights #68

Merged
merged 47 commits into from
Nov 3, 2022
Merged

Automatically download model weights #68

merged 47 commits into from
Nov 3, 2022

Conversation

bittremieux
Copy link
Collaborator

@bittremieux bittremieux commented Aug 23, 2022

Fixes #17. Fixes #75.

Use cached model weights or download them from GitHub.

If no weights file (extension: .ckpt) is available in the cache directory, it will be downloaded from a release asset on GitHub. Model weights are retrieved by matching release version. If no model weights for an identical release (major, minor, patch), alternative releases with matching (i) major and minor, or (ii) major versions will be used. If no matching release can be found, no model weights will be downloaded.

Note that the GitHub API is limited to 60 requests from the same IP per hour. A log message provides instructions to explicitly specify the model file for subsequent uses.

Review: @wsnoble to check whether this is the desired behavior for the users and the documentation, @wfondrie to check the code.

Base automatically changed from fix to main August 24, 2022 03:01
@bittremieux bittremieux removed the request for review from wsnoble August 25, 2022 19:52
Copy link
Collaborator

@wfondrie wfondrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it solves the problem to me. Its a bit complicated and I think it'd be worthwhile to create unit tests for it, perhaps by mocking a response GitHub.

Also, do we have a way to verify that the architecture (number of layers, layer dim, etc) matches the downloaded weights, or to automatically set it?

casanovo/casanovo.py Outdated Show resolved Hide resolved
wfondrie added a commit that referenced this pull request Oct 3, 2022
@wfondrie
Copy link
Collaborator

wfondrie commented Oct 11, 2022

It looks like checking the version is failing in GH Actions regex processing the version is not working 🤔

@bittremieux
Copy link
Collaborator Author

I see you removed the psutil dependency. This is necessary to correctly set resources on a shared system, such as the cluster. os.cpu_count() will give the number of CPUs that are present, psutil instead will give the number CPUs that can be used. Thus, if you use os.cpu_count() but you requested only a specific number of CPUs, this will make sure that there aren't an excessive number of worker threads (extreme case: on n032 use 40 workers threads when the job only has 1 CPU).

@codecov
Copy link

codecov bot commented Oct 11, 2022

Codecov Report

Merging #68 (aec590a) into main (a5c05b8) will increase coverage by 54.01%.
The diff coverage is 94.38%.

@@             Coverage Diff             @@
##             main      #68       +/-   ##
===========================================
+ Coverage   14.31%   68.32%   +54.01%     
===========================================
  Files           9       10        +1     
  Lines         559      644       +85     
===========================================
+ Hits           80      440      +360     
+ Misses        479      204      -275     
Impacted Files Coverage Δ
casanovo/denovo/dataloaders.py 85.18% <ø> (+85.18%) ⬆️
casanovo/denovo/model.py 61.15% <ø> (+24.46%) ⬆️
casanovo/denovo/model_runner.py 49.53% <85.71%> (+49.53%) ⬆️
casanovo/casanovo.py 89.13% <94.82%> (+89.13%) ⬆️
casanovo/utils.py 100.00% <100.00%> (ø)
casanovo/data/ms_io.py 96.15% <0.00%> (+75.00%) ⬆️
casanovo/data/datasets.py 82.75% <0.00%> (+82.75%) ⬆️
... and 1 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@wfondrie
Copy link
Collaborator

Now we're getting a multiprocessing error on Windows 🤦‍♂️

@bittremieux bittremieux merged commit f3696ca into main Nov 3, 2022
@bittremieux bittremieux deleted the weights branch November 3, 2022 02:59
melihyilmaz added a commit that referenced this pull request Nov 3, 2022
* Download model weights from GitHub release

* Include dependencies

* Update model usage documentation

* Reformat with black

* Download weights to the OS-specific app dir

* Don't download weights if already in cache dir

* Update model file instructions

* Remove release notes from the README

We have this information on the Releases page now.

* Remove explicit model specification from example commands

* Harmonize default parameters and config values

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

* No need to specify config file by default

This simplifies the examples that most users will want to use.

* Simplify version matching regex

* Remove depthcharge related tests

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

* Make sure that package data is included

I.e. the config YAML file.

* Remove obsolote (ppx) tests

* Update integration test

* Add MacOS support and support for Apple's MPS chips

* Fail test but print version

* Added n_worker fn and tests

* Create split_version fn and add unit tests

* Fix debugging unit test

* Explicitly set version

* Monkeypatch loaded version

* Add device selector, so that on CPU-only runs the devices > 0

* Add windows patch

* Fix typo

* Revert

* Use main process for data loading on Windows

* Fix typo

* Fix unit test

* Fix devices for when num_workers == 0

* Fix devices for when num_workers == 0

* Minor README updates

* Import reordering

* Minor code and docstring reformatting

* Test model weights retrieval

* Fix getting the number of devices

* Disable excessive Tensorboard deprecation warnings

* Don't use worker threads on MacOS

It crashes the DataLoader: pytorch/pytorch#70344

* Warnings need to be ignored before import

* Additional weights tests

- Non-matching version
- GitHub rate limit exceeded

* Disable tests on MacOS

* Include Python 3.10 as supported version

Co-authored-by: William Fondrie <[email protected]>

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Fondrie <[email protected]>
melihyilmaz added a commit that referenced this pull request Nov 3, 2022
* Download model weights from GitHub release

* Include dependencies

* Update model usage documentation

* Reformat with black

* Download weights to the OS-specific app dir

* Don't download weights if already in cache dir

* Update model file instructions

* Remove release notes from the README

We have this information on the Releases page now.

* Remove explicit model specification from example commands

* Harmonize default parameters and config values

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

* No need to specify config file by default

This simplifies the examples that most users will want to use.

* Simplify version matching regex

* Remove depthcharge related tests

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

* Make sure that package data is included

I.e. the config YAML file.

* Remove obsolote (ppx) tests

* Update integration test

* Add MacOS support and support for Apple's MPS chips

* Fail test but print version

* Added n_worker fn and tests

* Create split_version fn and add unit tests

* Fix debugging unit test

* Explicitly set version

* Monkeypatch loaded version

* Add device selector, so that on CPU-only runs the devices > 0

* Add windows patch

* Fix typo

* Revert

* Use main process for data loading on Windows

* Fix typo

* Fix unit test

* Fix devices for when num_workers == 0

* Fix devices for when num_workers == 0

* Minor README updates

* Import reordering

* Minor code and docstring reformatting

* Test model weights retrieval

* Fix getting the number of devices

* Disable excessive Tensorboard deprecation warnings

* Don't use worker threads on MacOS

It crashes the DataLoader: pytorch/pytorch#70344

* Warnings need to be ignored before import

* Additional weights tests

- Non-matching version
- GitHub rate limit exceeded

* Disable tests on MacOS

* Include Python 3.10 as supported version

Co-authored-by: William Fondrie <[email protected]>

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Fondrie <[email protected]>
melihyilmaz added a commit that referenced this pull request Nov 5, 2022
bittremieux added a commit that referenced this pull request Nov 18, 2022
* Add beam search

* Delete print statements

* Automatically download model weights (#68) (#88)

* Download model weights from GitHub release

* Include dependencies

* Update model usage documentation

* Reformat with black

* Download weights to the OS-specific app dir

* Don't download weights if already in cache dir

* Update model file instructions

* Remove release notes from the README

We have this information on the Releases page now.

* Remove explicit model specification from example commands

* Harmonize default parameters and config values

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

* No need to specify config file by default

This simplifies the examples that most users will want to use.

* Simplify version matching regex

* Remove depthcharge related tests

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

* Make sure that package data is included

I.e. the config YAML file.

* Remove obsolote (ppx) tests

* Update integration test

* Add MacOS support and support for Apple's MPS chips

* Fail test but print version

* Added n_worker fn and tests

* Create split_version fn and add unit tests

* Fix debugging unit test

* Explicitly set version

* Monkeypatch loaded version

* Add device selector, so that on CPU-only runs the devices > 0

* Add windows patch

* Fix typo

* Revert

* Use main process for data loading on Windows

* Fix typo

* Fix unit test

* Fix devices for when num_workers == 0

* Fix devices for when num_workers == 0

* Minor README updates

* Import reordering

* Minor code and docstring reformatting

* Test model weights retrieval

* Fix getting the number of devices

* Disable excessive Tensorboard deprecation warnings

* Don't use worker threads on MacOS

It crashes the DataLoader: pytorch/pytorch#70344

* Warnings need to be ignored before import

* Additional weights tests

- Non-matching version
- GitHub rate limit exceeded

* Disable tests on MacOS

* Include Python 3.10 as supported version

Co-authored-by: William Fondrie <[email protected]>

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Fondrie <[email protected]>

* Automatically download model weights (#68) (#89)

* Download model weights from GitHub release

* Include dependencies

* Update model usage documentation

* Reformat with black

* Download weights to the OS-specific app dir

* Don't download weights if already in cache dir

* Update model file instructions

* Remove release notes from the README

We have this information on the Releases page now.

* Remove explicit model specification from example commands

* Harmonize default parameters and config values

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

* No need to specify config file by default

This simplifies the examples that most users will want to use.

* Simplify version matching regex

* Remove depthcharge related tests

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

* Make sure that package data is included

I.e. the config YAML file.

* Remove obsolote (ppx) tests

* Update integration test

* Add MacOS support and support for Apple's MPS chips

* Fail test but print version

* Added n_worker fn and tests

* Create split_version fn and add unit tests

* Fix debugging unit test

* Explicitly set version

* Monkeypatch loaded version

* Add device selector, so that on CPU-only runs the devices > 0

* Add windows patch

* Fix typo

* Revert

* Use main process for data loading on Windows

* Fix typo

* Fix unit test

* Fix devices for when num_workers == 0

* Fix devices for when num_workers == 0

* Minor README updates

* Import reordering

* Minor code and docstring reformatting

* Test model weights retrieval

* Fix getting the number of devices

* Disable excessive Tensorboard deprecation warnings

* Don't use worker threads on MacOS

It crashes the DataLoader: pytorch/pytorch#70344

* Warnings need to be ignored before import

* Additional weights tests

- Non-matching version
- GitHub rate limit exceeded

* Disable tests on MacOS

* Include Python 3.10 as supported version

Co-authored-by: William Fondrie <[email protected]>

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Fondrie <[email protected]>

* Break beam search to testable subfunctions

* Fix precursor m/z termination and filtering

* Add unit testing for beam search

* Add beamsearch comments and fix formatting

* Address requested changes and minor fixes

* Add more unit tests for beam search

* Check NH3 loss for early stopping

* Consistent parameter order

* Update docstrings

* Remove unused precursors parameter

* Update beam matching mask in a level higher

* Minor refactoring to avoid code duplication

* Update imports

* Simplification refactoring

* Fix unit tests

* Simplify predicted peptide caching

* Simplify predicted peptide caching

* Simplify predicted peptide caching

* Unify predicted peptide caching

* Restrict tensor reshape to subfunction and minor fixes

* Finish beams when all isotopes exceed the precursor m/z tolerance

* Generalize look-ahead for tokens with negative mass

* Remove greedy decoding functionality

* Handle case with unfinished beams and add test

* Upgrade required depthcharge version

* Use detokenize function

* Add test for negative mass-aware termination

* Fix egative mass-aware beam termination

* Minor refactoring

* Add test for dummy output at max length

* Fixed and refactored peptide and scocre mzTab outputs

* Add tests for peptide and score output formatting

* Small fixes

* Update changelog

* Fix changelog update

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Fondrie <[email protected]>
Co-authored-by: Wout Bittremieux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants