-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate evaluate Command #359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but I find the changes in model.py
a bit messy and unsatisfying.
If the goal of the evaluation is to get the peptide and amino acid metrics, can this not be simplified by:
- First just do standard predictions.
- After the whole inference has been finished, you then have all of the peptides for each spectrum. Then (at a higher level than inside of the model) you can read the peptide sequences from the annotated MGF separately, and compare these to each other.
- This removes all of the
validation
related complexity from the model and should simplify the flow of the data considerably. - The evaluation part is also more maintainable, and it would for example be much easier to add another data source for evaluation (e.g. an mzTab or CSV file with "ground truth" rather than an annotated MGF).
Note that it's slightly different from the current validation approach, because that also gives the loss, which you wouldn't have in this approach. However, it seems to me that the loss is not that informative anyway, and not something a user would expect to get when specifying evaluate
during prediction.
The best approach to tackle this should probably be discussed.
I agree that this approach makes more sense. Tentatively what I'm thinking is that |
* bug report template * punctuation, hardware description item * Restrict NumPy to pre-2.0 (#344) * Restrict NumPy to pre-2.0 * Update changelog * Update paper reference (#361) --------- Co-authored-by: Lilferrit <[email protected]>
I reimplemented evaluate mode by having |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #359 +/- ##
==========================================
+ Coverage 94.03% 94.26% +0.23%
==========================================
Files 12 12
Lines 1022 1029 +7
==========================================
+ Hits 961 970 +9
+ Misses 61 59 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern at this point is how to go about testing the evaluation metric calculations. The simplest solution I can think of is to have log_metrics also return the aa_precision and pep_precision in addition to the logging operation. From there I could introduce some unit tests.
I don't think that we need to check the actual metric values. We have dedicated unit tests that already do that.
What should be added is some tests that verify correct behavior without/with evaluation specified based on different types of input files (annotated vs simple MGF, mzML).
Sounds good, I'll look into getting some tests implemented for these cases. |
I did some experimenting while trying to set up the test cases, and it looks like in the current implementation of Imo silently ignoring unannotated files is not desirable behavior in the case of running model evaluation post sequencing (this also means that the unannotated files would simply not get sequenced). The best way that comes to mind of getting around this issue is to check that all of the peak files are annotated before sequencing begins, and throwing an exception if any of them aren't. However I'm not sure if there's a quick and easy way to do this. I put the test cases in progress on the branch
|
I agree.
That's tricky, because there's indeed no elegant way to do this, so I'm not really in favor of trying to hack this in. Instead, giving better error messages is a good starting point. Then at least users will know what the problem is and how to fix it if they want to run evaluation. |
* save best model * save best model * updated unit tests * remove save top k config item * added save_top_k to deprecated config options * changelog entry * test case, formatting * requested changes
* bug report template * punctuation, hardware description item * Restrict NumPy to pre-2.0 (#344) * Restrict NumPy to pre-2.0 * Update changelog * Update paper reference (#361) --------- Co-authored-by: Lilferrit <[email protected]>
I added some light error handling such that if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more final tweaks.
Eliminated the
evaluate
command in favor of a--evaluate
command line option for thesequence
command. Evaluation metrics will still be logged to the console as before if the--evaluate
options is set. The model (Spec2Pep
) will also log predictions to its out_writer in validation mode similar as it does in prediction mode.