Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add beam search peptide decoding #87

Merged
merged 44 commits into from
Nov 18, 2022
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
0710ee8
Add beam search
melihyilmaz Nov 2, 2022
0476618
Delete print statements
melihyilmaz Nov 2, 2022
bfbffa6
Automatically download model weights (#68) (#88)
melihyilmaz Nov 3, 2022
bc7d350
Automatically download model weights (#68) (#89)
melihyilmaz Nov 3, 2022
dc77f1b
Break beam search to testable subfunctions
melihyilmaz Nov 5, 2022
bab7072
Automatically download model weights (#68)
melihyilmaz Nov 5, 2022
3f0576e
Fix precursor m/z termination and filtering
melihyilmaz Nov 6, 2022
6f39e1c
Add unit testing for beam search
melihyilmaz Nov 6, 2022
5d61868
Add beamsearch comments and fix formatting
melihyilmaz Nov 6, 2022
be05748
Merge branch 'main' into beamsearch_melih
melihyilmaz Nov 8, 2022
3c81755
Address requested changes and minor fixes
melihyilmaz Nov 9, 2022
75c9b50
Add more unit tests for beam search
melihyilmaz Nov 9, 2022
b962453
Check NH3 loss for early stopping
melihyilmaz Nov 14, 2022
9050fc8
Consistent parameter order
bittremieux Nov 14, 2022
592efb0
Update docstrings
bittremieux Nov 14, 2022
426ece8
Remove unused precursors parameter
bittremieux Nov 14, 2022
fcb006c
Update beam matching mask in a level higher
bittremieux Nov 14, 2022
6bc2ba4
Minor refactoring to avoid code duplication
bittremieux Nov 14, 2022
cbdcacc
Update imports
bittremieux Nov 14, 2022
646f9dc
Simplification refactoring
bittremieux Nov 14, 2022
616c0c4
Fix unit tests
bittremieux Nov 14, 2022
66c1b2e
Merge remote-tracking branch 'origin/main' into beamsearch_melih
bittremieux Nov 15, 2022
c705b0e
Simplify predicted peptide caching
bittremieux Nov 15, 2022
cbeefa7
Simplify predicted peptide caching
bittremieux Nov 15, 2022
6e3b6da
Simplify predicted peptide caching
bittremieux Nov 15, 2022
1a3bcb1
Unify predicted peptide caching
bittremieux Nov 15, 2022
c2ec4d2
Restrict tensor reshape to subfunction and minor fixes
melihyilmaz Nov 15, 2022
62c51ae
Finish beams when all isotopes exceed the precursor m/z tolerance
bittremieux Nov 15, 2022
57b5b31
Generalize look-ahead for tokens with negative mass
bittremieux Nov 15, 2022
b65aaca
Remove greedy decoding functionality
bittremieux Nov 15, 2022
eb08c5b
Merge branch 'main' into beamsearch_melih
melihyilmaz Nov 15, 2022
a2f9a3d
Handle case with unfinished beams and add test
melihyilmaz Nov 15, 2022
bc64949
Merge branch 'main' into beamsearch_melih
melihyilmaz Nov 16, 2022
63262e9
Upgrade required depthcharge version
melihyilmaz Nov 16, 2022
03b9172
Use detokenize function
melihyilmaz Nov 16, 2022
412093b
Add test for negative mass-aware termination
melihyilmaz Nov 16, 2022
3126e14
Fix egative mass-aware beam termination
melihyilmaz Nov 16, 2022
053967e
Minor refactoring
bittremieux Nov 16, 2022
bdfa915
Add test for dummy output at max length
melihyilmaz Nov 17, 2022
0c6c0f0
Fixed and refactored peptide and scocre mzTab outputs
melihyilmaz Nov 17, 2022
e50bf90
Add tests for peptide and score output formatting
melihyilmaz Nov 17, 2022
7e934f5
Small fixes
bittremieux Nov 17, 2022
10df5e0
Update changelog
bittremieux Nov 18, 2022
f4fa6c8
Fix changelog update
bittremieux Nov 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions casanovo/casanovo.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ def main(
weight_decay=float,
train_batch_size=int,
predict_batch_size=int,
n_beams=int,
max_epochs=int,
num_sanity_val_steps=int,
train_from_scratch=bool,
Expand Down
1 change: 1 addition & 0 deletions casanovo/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ weight_decay: 1e-5
# Training/inference options.
train_batch_size: 32
predict_batch_size: 1024
n_beams: 5

logger:
max_epochs: 30
Expand Down
27 changes: 15 additions & 12 deletions casanovo/denovo/evaluate.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Methods to evaluate peptide-spectrum predictions."""
import re
from typing import Dict, List, Tuple
from typing import Dict, Iterable, List, Tuple

import numpy as np
from spectrum_utils.utils import mass_diff
Expand Down Expand Up @@ -182,8 +182,8 @@ def aa_match(


def aa_match_batch(
peptides1: List[str],
peptides2: List[str],
peptides1: Iterable,
peptides2: Iterable,
aa_dict: Dict[str, float],
cum_mass_threshold: float = 0.5,
ind_mass_threshold: float = 0.1,
Expand All @@ -194,10 +194,10 @@ def aa_match_batch(

Parameters
----------
peptides1 : List[str]
The first list of (untokenized) peptide sequences to be compared.
peptides2 : List[str]
The second list of (untokenized) peptide sequences to be compared.
peptides1 : Iterable
The first list of peptide sequences to be compared.
peptides2 : Iterable
The second list of peptide sequences to be compared.
aa_dict : Dict[str, float]
Mapping of amino acid tokens to their mass values.
cum_mass_threshold : float
Expand All @@ -221,13 +221,16 @@ def aa_match_batch(
"""
aa_matches_batch, n_aa1, n_aa2 = [], 0, 0
for peptide1, peptide2 in zip(peptides1, peptides2):
tokens1 = re.split(r"(?<=.)(?=[A-Z])", peptide1)
tokens2 = re.split(r"(?<=.)(?=[A-Z])", peptide2)
n_aa1, n_aa2 = n_aa1 + len(tokens1), n_aa2 + len(tokens2)
# Split peptides into individual AAs if necessary.
if isinstance(peptide1, str):
peptide1 = re.split(r"(?<=.)(?=[A-Z])", peptide1)
if isinstance(peptide2, str):
peptide2 = re.split(r"(?<=.)(?=[A-Z])", peptide2)
n_aa1, n_aa2 = n_aa1 + len(peptide1), n_aa2 + len(peptide2)
aa_matches_batch.append(
aa_match(
tokens1,
tokens2,
peptide1,
peptide2,
aa_dict,
cum_mass_threshold,
ind_mass_threshold,
Expand Down
Loading