Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added options for embeddings dimension reduction #123

Conversation

nadyadevani3112
Copy link
Contributor

provided two options of embeddings dimension reduction: UMAP or T-SNE

@SebieF SebieF self-requested a review November 21, 2024 13:01
Copy link
Collaborator

@SebieF SebieF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your contribution to biotrainer!! :) I provided some feedback on your changes, in general they look very good.

I also rebased your branch to be consistent with the latest version of the develop branch, such that you do not have to do this somewhat tedious work of resolving conflicts: https://github.com/biocentral/biotrainer/commits/feature/emb-dim-redu-rebased/ I will provide you with detailed instructions how to use that before applying the suggestions in the next comment. Thanks alot again for your contribution!


@classproperty
def allowed_protocols(self) -> List[Protocol]:
return [Protocol.sequence_to_class, Protocol.sequence_to_value]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use Protocol.using_per_sequence_embeddings() here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this part from the Interaction class in general_options.py, should I change it there too?

biotrainer/config/embedding_options.py Outdated Show resolved Hide resolved
biotrainer/config/embedding_options.py Outdated Show resolved Hide resolved
biotrainer/trainers/trainer.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
biotrainer/embedders/embedding_service.py Outdated Show resolved Hide resolved
biotrainer/embedders/embedding_service.py Outdated Show resolved Hide resolved
biotrainer/embedders/embedding_service.py Outdated Show resolved Hide resolved
biotrainer/embedders/embedding_service.py Show resolved Hide resolved
biotrainer/embedders/embedding_service.py Outdated Show resolved Hide resolved
@SebieF
Copy link
Collaborator

SebieF commented Nov 21, 2024

Here's how you can change your branch such that it resolves the conflicts:

Assuming that you are on your current PR branch:

git remote add upstream https://github.com/biocentral/biotrainer.git
git fetch upstream
git reset --hard upstream/feature/emb-dim-redu-rebased
git push --force-with-lease

@SebieF
Copy link
Collaborator

SebieF commented Nov 21, 2024

P.S.: Do not worry about the broken Windows tests, they are failing because of onnx at the moment: #111

@nadyadevani3112
Copy link
Contributor Author

i couldn't "git push --force-with-lease" successfully, had to pull and merge some conflicts before pushing it again successfully

@nadyadevani3112
Copy link
Contributor Author

thank you for all your feedbacks, really appreciate them! I have made the changes, are you able to see the commits and review the PR again?

@SebieF SebieF self-requested a review November 25, 2024 16:17
Copy link
Collaborator

@SebieF SebieF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one minor additional suggestion :)

if not self._protocol.using_per_sequence_embeddings():
logger.info("Dimensionality reduction cannot be performed as \
the embeddings are not per-protein embeddings")
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to raise an exception here, because if the user provides both the dimension_reduction_method and n_reduced_components than he or she will probably expect this to work and might not be aware, that it was not done after training. Admittedly, this is an edge case because if the number of samples is less than 3, then the train/validation/test splitting will also not work. But better be safe than sorry :)

@SebieF
Copy link
Collaborator

SebieF commented Nov 25, 2024

Thank you for applying the changes so quickly! I will look into rebasing again this week. Do you also want to update the documentation on the new config file options and add a new example on how to use it? If you do not have time for that currently, I would just add it after the PR is merged, so just let me know if you want to work on that. Thanks again for the contribution :)

@nadyadevani3112
Copy link
Contributor Author

hi sorry it took me a while to implement the changes. I have also updated the config file options documentation and added a new example on how to apply the dimension reduction.

@SebieF SebieF force-pushed the feature/embedding-dimension-reduction branch from 3aa9815 to 9330fbe Compare December 9, 2024 12:44
@SebieF SebieF self-requested a review December 9, 2024 12:44
@SebieF
Copy link
Collaborator

SebieF commented Dec 9, 2024

Hi! Thanks for applying the changes and all your work! The final state looked good to me. In order to fix the conflicts with the current develop branch, I cherry picked your commits and made some tiny additions to them with you as the co-author. Then I had to force push the updated branch in order to be able to merge the PR. I hope that was okay for you! You can still find the original branch with the merges here: https://github.com/biocentral/biotrainer/tree/feature/embedding-dimension-reduction-keep

Thanks again for your contribution. If you plan to contribute to biotrainer in the future, feel free to reach out to me via my university mail address (see my profile). I would also be curious to hear how you are using biotrainer for your application or research :)

@SebieF SebieF merged commit daaa0aa into sacdallago:develop Dec 9, 2024
2 of 4 checks passed
@SebieF SebieF mentioned this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants