Add beam search peptide decoding #87

melihyilmaz · 2022-11-02T23:31:43Z

Add beam search decoding to replace greedy decoding. Organized beam_search_decode() as a series of sub-functions to allow for unit testing each.

From a larger group of peptides predictions, this implementation of beam search decoding caches k highest scoring peptide predictions for each spectrum, prioritizing the peptides fitting the observed precursor mass. As an output, the highest scoring peptide within precursor m/z tolerance, is returned, i.e. a single PSM for each spectra is recorded in the output mzTab file. If there are no cached peptides within precursor m/z tolerance for a spectrum, the highest scoring peptide is returned in the output.

Also fixed amino acid-level score calculation in model.on_predict_epoch_end(), which was previously retrieving a list of shifted-by-one-AA amino acid scores, i.e. first AA prediction being assigned the second score, second AA the third score etc. where last AA was assigned the score for the stop toke.

codecov · 2022-11-02T23:36:36Z

Codecov Report

Merging #87 (f4fa6c8) into main (a810175) will increase coverage by 3.92%.
The diff coverage is 97.51%.

@@            Coverage Diff             @@
##             main      #87      +/-   ##
==========================================
+ Coverage   75.15%   79.08%   +3.92%     
==========================================
  Files          10       10              
  Lines         644      784     +140     
==========================================
+ Hits          484      620     +136     
- Misses        160      164       +4

Impacted Files	Coverage Δ
casanovo/casanovo.py	`89.13% <ø> (ø)`
casanovo/denovo/model_runner.py	`49.53% <ø> (ø)`
casanovo/denovo/model.py	`79.06% <97.41%> (+17.91%)`	⬆️
casanovo/denovo/evaluate.py	`86.15% <100.00%> (+0.43%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

* Download model weights from GitHub release * Include dependencies * Update model usage documentation * Reformat with black * Download weights to the OS-specific app dir * Don't download weights if already in cache dir * Update model file instructions * Remove release notes from the README We have this information on the Releases page now. * Remove explicit model specification from example commands * Harmonize default parameters and config values As per discussion on Slack (https://noblelab.slack.com/archives/C01MXN4NWMP/p1659803053573279). * No need to specify config file by default This simplifies the examples that most users will want to use. * Simplify version matching regex * Remove depthcharge related tests The transformer tests only deal with depthcharge functionality and just seem copied from its repository. * Make sure that package data is included I.e. the config YAML file. * Remove obsolote (ppx) tests * Update integration test * Add MacOS support and support for Apple's MPS chips * Fail test but print version * Added n_worker fn and tests * Create split_version fn and add unit tests * Fix debugging unit test * Explicitly set version * Monkeypatch loaded version * Add device selector, so that on CPU-only runs the devices > 0 * Add windows patch * Fix typo * Revert * Use main process for data loading on Windows * Fix typo * Fix unit test * Fix devices for when num_workers == 0 * Fix devices for when num_workers == 0 * Minor README updates * Import reordering * Minor code and docstring reformatting * Test model weights retrieval * Fix getting the number of devices * Disable excessive Tensorboard deprecation warnings * Don't use worker threads on MacOS It crashes the DataLoader: pytorch/pytorch#70344 * Warnings need to be ignored before import * Additional weights tests - Non-matching version - GitHub rate limit exceeded * Disable tests on MacOS * Include Python 3.10 as supported version Co-authored-by: William Fondrie <[email protected]> Co-authored-by: Wout Bittremieux <[email protected]> Co-authored-by: William Fondrie <[email protected]>

bittremieux

Nice work. This is non-trivial functionality to implement.

Here is some initial feedback. I'll probably want to do another thorough review after these comments have been addressed.

Some comments are relevant to multiple places in the code, but I haven't repeated the same thing multiple times, so be a bit mindful of that.

casanovo/casanovo.py

casanovo/config.yaml

casanovo/denovo/model.py

tests/test_unit.py

bittremieux

It's looking pretty good. I have a few comments / requests for clarification. I especially don't understand the latest NH3 loss-related changes.

casanovo/denovo/model.py

melihyilmaz · 2022-11-16T07:46:29Z

Added a small fix to negative mass-aware termination (to ensure we're not terminating a beam if doesn't exceed tolerance under any negative mass token) and unit tests covering that functionality. Only that and 1 other unresolved conversation above remains.

casanovo/denovo/model.py

melihyilmaz and others added 2 commits November 2, 2022 16:05

Add beam search

0710ee8

Delete print statements

0476618

melihyilmaz and others added 7 commits November 2, 2022 22:20

Break beam search to testable subfunctions

dc77f1b

Automatically download model weights (#68)

bab7072

Fix precursor m/z termination and filtering

3f0576e

Add unit testing for beam search

6f39e1c

Add beamsearch comments and fix formatting

5d61868

melihyilmaz marked this pull request as ready for review November 6, 2022 17:57

melihyilmaz requested review from wfondrie and bittremieux November 6, 2022 17:57

bittremieux requested changes Nov 8, 2022

View reviewed changes

melihyilmaz added 3 commits November 8, 2022 10:00

Merge branch 'main' into beamsearch_melih

be05748

Address requested changes and minor fixes

3c81755

Add more unit tests for beam search

75c9b50

melihyilmaz requested a review from bittremieux November 10, 2022 00:19

melihyilmaz and others added 7 commits November 14, 2022 12:31

Check NH3 loss for early stopping

b962453

Consistent parameter order

9050fc8

Update docstrings

592efb0

Remove unused precursors parameter

426ece8

Update beam matching mask in a level higher

fcb006c

Minor refactoring to avoid code duplication

6bc2ba4

Update imports

cbdcacc

bittremieux requested changes Nov 14, 2022

View reviewed changes

bittremieux reviewed Nov 14, 2022

View reviewed changes

casanovo/denovo/model.py Outdated Show resolved Hide resolved

bittremieux reviewed Nov 14, 2022

View reviewed changes

casanovo/denovo/model.py Show resolved Hide resolved

bittremieux reviewed Nov 14, 2022

View reviewed changes

casanovo/denovo/model.py Outdated Show resolved Hide resolved

bittremieux reviewed Nov 14, 2022

View reviewed changes

casanovo/denovo/model.py Outdated Show resolved Hide resolved

bittremieux and others added 9 commits November 14, 2022 16:00

Merge remote-tracking branch 'origin/main' into beamsearch_melih

66c1b2e

Simplify predicted peptide caching

c705b0e

Simplify predicted peptide caching

cbeefa7

Simplify predicted peptide caching

6e3b6da

Unify predicted peptide caching

1a3bcb1

Restrict tensor reshape to subfunction and minor fixes

c2ec4d2

Finish beams when all isotopes exceed the precursor m/z tolerance

62c51ae

Generalize look-ahead for tokens with negative mass

57b5b31

Remove greedy decoding functionality

b65aaca

melihyilmaz mentioned this pull request Nov 15, 2022

Add option to dekonize peptide into AA list wfondrie/depthcharge#22

Merged

melihyilmaz and others added 7 commits November 15, 2022 14:33

Merge branch 'main' into beamsearch_melih

eb08c5b

Handle case with unfinished beams and add test

a2f9a3d

Merge branch 'main' into beamsearch_melih

bc64949

Upgrade required depthcharge version

63262e9

Use detokenize function

03b9172

Add test for negative mass-aware termination

412093b

Fix egative mass-aware beam termination

3126e14

Minor refactoring

053967e

bittremieux reviewed Nov 17, 2022

View reviewed changes

casanovo/denovo/model.py Outdated Show resolved Hide resolved

melihyilmaz and others added 4 commits November 16, 2022 21:21

Add test for dummy output at max length

bdfa915

Fixed and refactored peptide and scocre mzTab outputs

0c6c0f0

Add tests for peptide and score output formatting

e50bf90

Small fixes

7e934f5

bittremieux reviewed Nov 17, 2022

View reviewed changes

casanovo/denovo/model.py Show resolved Hide resolved

bittremieux added 2 commits November 17, 2022 19:37

Update changelog

10df5e0

Fix changelog update

f4fa6c8

bittremieux approved these changes Nov 18, 2022

View reviewed changes

bittremieux merged commit dbfabb5 into main Nov 18, 2022

bittremieux deleted the beamsearch_melih branch November 18, 2022 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add beam search peptide decoding #87

Add beam search peptide decoding #87

melihyilmaz commented Nov 2, 2022 •

edited

Loading

codecov bot commented Nov 2, 2022 •

edited

Loading

bittremieux left a comment

bittremieux left a comment

melihyilmaz commented Nov 16, 2022

Add beam search peptide decoding #87

Add beam search peptide decoding #87

Conversation

melihyilmaz commented Nov 2, 2022 • edited Loading

codecov bot commented Nov 2, 2022 • edited Loading

Codecov Report

bittremieux left a comment

Choose a reason for hiding this comment

bittremieux left a comment

Choose a reason for hiding this comment

melihyilmaz commented Nov 16, 2022

melihyilmaz commented Nov 2, 2022 •

edited

Loading

codecov bot commented Nov 2, 2022 •

edited

Loading