Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add beam search peptide decoding #87

Merged
merged 44 commits into from
Nov 18, 2022
Merged

Add beam search peptide decoding #87

merged 44 commits into from
Nov 18, 2022

Conversation

melihyilmaz
Copy link
Collaborator

@melihyilmaz melihyilmaz commented Nov 2, 2022

Add beam search decoding to replace greedy decoding. Organized beam_search_decode() as a series of sub-functions to allow for unit testing each.

From a larger group of peptides predictions, this implementation of beam search decoding caches k highest scoring peptide predictions for each spectrum, prioritizing the peptides fitting the observed precursor mass. As an output, the highest scoring peptide within precursor m/z tolerance, is returned, i.e. a single PSM for each spectra is recorded in the output mzTab file. If there are no cached peptides within precursor m/z tolerance for a spectrum, the highest scoring peptide is returned in the output.

Also fixed amino acid-level score calculation in model.on_predict_epoch_end(), which was previously retrieving a list of shifted-by-one-AA amino acid scores, i.e. first AA prediction being assigned the second score, second AA the third score etc. where last AA was assigned the score for the stop toke.

@codecov
Copy link

codecov bot commented Nov 2, 2022

Codecov Report

Merging #87 (f4fa6c8) into main (a810175) will increase coverage by 3.92%.
The diff coverage is 97.51%.

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   75.15%   79.08%   +3.92%     
==========================================
  Files          10       10              
  Lines         644      784     +140     
==========================================
+ Hits          484      620     +136     
- Misses        160      164       +4     
Impacted Files Coverage Δ
casanovo/casanovo.py 89.13% <ø> (ø)
casanovo/denovo/model_runner.py 49.53% <ø> (ø)
casanovo/denovo/model.py 79.06% <97.41%> (+17.91%) ⬆️
casanovo/denovo/evaluate.py 86.15% <100.00%> (+0.43%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

melihyilmaz and others added 7 commits November 2, 2022 22:20
* Download model weights from GitHub release

* Include dependencies

* Update model usage documentation

* Reformat with black

* Download weights to the OS-specific app dir

* Don't download weights if already in cache dir

* Update model file instructions

* Remove release notes from the README

We have this information on the Releases page now.

* Remove explicit model specification from example commands

* Harmonize default parameters and config values

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

* No need to specify config file by default

This simplifies the examples that most users will want to use.

* Simplify version matching regex

* Remove depthcharge related tests

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

* Make sure that package data is included

I.e. the config YAML file.

* Remove obsolote (ppx) tests

* Update integration test

* Add MacOS support and support for Apple's MPS chips

* Fail test but print version

* Added n_worker fn and tests

* Create split_version fn and add unit tests

* Fix debugging unit test

* Explicitly set version

* Monkeypatch loaded version

* Add device selector, so that on CPU-only runs the devices > 0

* Add windows patch

* Fix typo

* Revert

* Use main process for data loading on Windows

* Fix typo

* Fix unit test

* Fix devices for when num_workers == 0

* Fix devices for when num_workers == 0

* Minor README updates

* Import reordering

* Minor code and docstring reformatting

* Test model weights retrieval

* Fix getting the number of devices

* Disable excessive Tensorboard deprecation warnings

* Don't use worker threads on MacOS

It crashes the DataLoader: pytorch/pytorch#70344

* Warnings need to be ignored before import

* Additional weights tests

- Non-matching version
- GitHub rate limit exceeded

* Disable tests on MacOS

* Include Python 3.10 as supported version

Co-authored-by: William Fondrie <[email protected]>

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Fondrie <[email protected]>
* Download model weights from GitHub release

* Include dependencies

* Update model usage documentation

* Reformat with black

* Download weights to the OS-specific app dir

* Don't download weights if already in cache dir

* Update model file instructions

* Remove release notes from the README

We have this information on the Releases page now.

* Remove explicit model specification from example commands

* Harmonize default parameters and config values

As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279).

* No need to specify config file by default

This simplifies the examples that most users will want to use.

* Simplify version matching regex

* Remove depthcharge related tests

The transformer tests only deal with depthcharge functionality and just seem copied from its repository.

* Make sure that package data is included

I.e. the config YAML file.

* Remove obsolote (ppx) tests

* Update integration test

* Add MacOS support and support for Apple's MPS chips

* Fail test but print version

* Added n_worker fn and tests

* Create split_version fn and add unit tests

* Fix debugging unit test

* Explicitly set version

* Monkeypatch loaded version

* Add device selector, so that on CPU-only runs the devices > 0

* Add windows patch

* Fix typo

* Revert

* Use main process for data loading on Windows

* Fix typo

* Fix unit test

* Fix devices for when num_workers == 0

* Fix devices for when num_workers == 0

* Minor README updates

* Import reordering

* Minor code and docstring reformatting

* Test model weights retrieval

* Fix getting the number of devices

* Disable excessive Tensorboard deprecation warnings

* Don't use worker threads on MacOS

It crashes the DataLoader: pytorch/pytorch#70344

* Warnings need to be ignored before import

* Additional weights tests

- Non-matching version
- GitHub rate limit exceeded

* Disable tests on MacOS

* Include Python 3.10 as supported version

Co-authored-by: William Fondrie <[email protected]>

Co-authored-by: Wout Bittremieux <[email protected]>
Co-authored-by: William Fondrie <[email protected]>
@melihyilmaz melihyilmaz marked this pull request as ready for review November 6, 2022 17:57
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. This is non-trivial functionality to implement.

Here is some initial feedback. I'll probably want to do another thorough review after these comments have been addressed.

Some comments are relevant to multiple places in the code, but I haven't repeated the same thing multiple times, so be a bit mindful of that.

casanovo/casanovo.py Outdated Show resolved Hide resolved
casanovo/config.yaml Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
tests/test_unit.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking pretty good. I have a few comments / requests for clarification. I especially don't understand the latest NH3 loss-related changes.

casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
casanovo/denovo/model.py Outdated Show resolved Hide resolved
@melihyilmaz
Copy link
Collaborator Author

Added a small fix to negative mass-aware termination (to ensure we're not terminating a beam if doesn't exceed tolerance under any negative mass token) and unit tests covering that functionality. Only that and 1 other unresolved conversation above remains.

@bittremieux bittremieux merged commit dbfabb5 into main Nov 18, 2022
@bittremieux bittremieux deleted the beamsearch_melih branch November 18, 2022 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants