-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added options for embeddings dimension reduction #123
added options for embeddings dimension reduction #123
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for your contribution to biotrainer!! :) I provided some feedback on your changes, in general they look very good.
I also rebased your branch to be consistent with the latest version of the develop branch, such that you do not have to do this somewhat tedious work of resolving conflicts: https://github.com/biocentral/biotrainer/commits/feature/emb-dim-redu-rebased/ I will provide you with detailed instructions how to use that before applying the suggestions in the next comment. Thanks alot again for your contribution!
|
||
@classproperty | ||
def allowed_protocols(self) -> List[Protocol]: | ||
return [Protocol.sequence_to_class, Protocol.sequence_to_value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use Protocol.using_per_sequence_embeddings() here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this part from the Interaction class in general_options.py, should I change it there too?
Here's how you can change your branch such that it resolves the conflicts: Assuming that you are on your current PR branch:
|
P.S.: Do not worry about the broken Windows tests, they are failing because of onnx at the moment: #111 |
i couldn't "git push --force-with-lease" successfully, had to pull and merge some conflicts before pushing it again successfully |
thank you for all your feedbacks, really appreciate them! I have made the changes, are you able to see the commits and review the PR again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one minor additional suggestion :)
if not self._protocol.using_per_sequence_embeddings(): | ||
logger.info("Dimensionality reduction cannot be performed as \ | ||
the embeddings are not per-protein embeddings") | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to raise an exception here, because if the user provides both the dimension_reduction_method and n_reduced_components than he or she will probably expect this to work and might not be aware, that it was not done after training. Admittedly, this is an edge case because if the number of samples is less than 3, then the train/validation/test splitting will also not work. But better be safe than sorry :)
Thank you for applying the changes so quickly! I will look into rebasing again this week. Do you also want to update the documentation on the new config file options and add a new example on how to use it? If you do not have time for that currently, I would just add it after the PR is merged, so just let me know if you want to work on that. Thanks again for the contribution :) |
hi sorry it took me a while to implement the changes. I have also updated the config file options documentation and added a new example on how to apply the dimension reduction. |
Co-authored-by: nadyadevani3112 <[email protected]>
…y, updated documentation and added example for using embeddings dimensionality reduction
Co-authored-by: nadyadevani3112 <[email protected]>
Co-authored-by: nadyadevani3112 <[email protected]>
3aa9815
to
9330fbe
Compare
Hi! Thanks for applying the changes and all your work! The final state looked good to me. In order to fix the conflicts with the current develop branch, I cherry picked your commits and made some tiny additions to them with you as the co-author. Then I had to force push the updated branch in order to be able to merge the PR. I hope that was okay for you! You can still find the original branch with the merges here: https://github.com/biocentral/biotrainer/tree/feature/embedding-dimension-reduction-keep Thanks again for your contribution. If you plan to contribute to biotrainer in the future, feel free to reach out to me via my university mail address (see my profile). I would also be curious to hear how you are using biotrainer for your application or research :) |
provided two options of embeddings dimension reduction: UMAP or T-SNE