Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add first version of SCOT demonstration notebook (Gromov-Wasserstein for multi-omics) #64

Merged
merged 5 commits into from
May 9, 2022

Conversation

antoinebelloir
Copy link
Contributor

@antoinebelloir antoinebelloir commented May 5, 2022

Adds a second biology example to the documentation notebooks.

The notebook presents an application of OTT's Gromov Wasserstein optimal transport to match single-cell points clouds from two different measurement spaces (e.g. mapping gene expressions measurements to chromatine accessibility measurements).
It is adapted from Demetci et al, Gromov-Wasserstein optimal transport to align single-cell multi-omics data, ICML 2020 Workshop on Computational Biology, 2020 (pdf here).

@antoinebelloir antoinebelloir changed the title add first version of SCOT demonstration notebook (Gromov-Wassertein for multi-omics) add first version of SCOT demonstration notebook (Gromov-Wasserstein for multi-omics) May 5, 2022
@antoinebelloir antoinebelloir marked this pull request as draft May 5, 2022 15:43
@antoinebelloir antoinebelloir marked this pull request as ready for review May 5, 2022 15:50
@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:52Z
----------------------------------------------------------------

the title here will define the length of that item in the left-menu, so maybe something shorter.


antoinebelloir commented on 2022-05-08T23:24:54Z
----------------------------------------------------------------

Changed to "Gromov-Wasserstein for multi-omics"

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:53Z
----------------------------------------------------------------

Maybe follow the rest of the templates and do not provide a tl;dr but rather an introducing sentence


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:54Z
----------------------------------------------------------------

typo in the Wasserstein


@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:54Z
----------------------------------------------------------------

I see \ characters there.

I think to keep in line with the other notebooks, the motivation should be shorter.


antoinebelloir commented on 2022-05-08T23:27:47Z
----------------------------------------------------------------

Agreed, the introduction paragraphs were too long. I changed it to a way shorter presentation, that sticks better with other notebooks

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:55Z
----------------------------------------------------------------

I think it's fine to say that you substitute POT for OTT, but if you mention "found to be faster" I think it might be important to put a comparison. Otherwise maybe something a bit more informative (POT can be slower if, e.g., it does not have the same convergence settings).

I think you can remove the two bottom sentences starting with "At the end..." and "Note that..."


antoinebelloir commented on 2022-05-08T23:34:10Z
----------------------------------------------------------------

I agree that a comparison of OTT vs POT is useful. I added it to the notebook. You are right to point out the convergence settings : for the contest to be fair, I therefore used POT's default parameters for OTT (maximum number of iterations=1000 and threshold=1e-9) and added a comment to make this explicit. You will see that OTT runs way faster on GPU. I did not run comparison experiences on CPU.

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:56Z
----------------------------------------------------------------

try to use code lines that have 80 characters max to follow python guidelines.


antoinebelloir commented on 2022-05-08T23:36:48Z
----------------------------------------------------------------

Following your comment I used Black to format most of my cells. It's cleaner now

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:57Z
----------------------------------------------------------------

Line #3.    %pip install seaborn

Here you could remove the #


antoinebelloir commented on 2022-05-08T23:36:58Z
----------------------------------------------------------------

Done

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:57Z
----------------------------------------------------------------

Line #6.      def find_correspondences(self, e, verbose=True):

call e epsilon would make it more readable


antoinebelloir commented on 2022-05-08T23:40:23Z
----------------------------------------------------------------

Agreed, I made a modification within "fin_correspondances" 's body but I could to change "e" to "epsilon" in the arguments without making the code heavier, since "e" is actually a keyword argument used by another method of the SCOT class

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:58Z
----------------------------------------------------------------

Line #11.                                epsilon=e, max_iterations=200,

are you using a different setting than that used by default for max_iterations? if yes might be worth commenting on this.


antoinebelloir commented on 2022-05-08T23:41:39Z
----------------------------------------------------------------

See my reply about OTT vs POT comparison. I changed max_iterations to 1000 and I set threshold to 1e-9 to match the POT implementation.

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:00:59Z
----------------------------------------------------------------

for epsilon and k, you could use quotes or latex


antoinebelloir commented on 2022-05-08T23:41:47Z
----------------------------------------------------------------

Corrected

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:01:00Z
----------------------------------------------------------------

same here for X and Y, use latex (markdown)


antoinebelloir commented on 2022-05-08T23:41:56Z
----------------------------------------------------------------

Also corrected

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:01:00Z
----------------------------------------------------------------

check this sentence ("we actually")


antoinebelloir commented on 2022-05-08T23:42:39Z
----------------------------------------------------------------

I rewrote this cell to make it clearer and to mention the OTT vs POT comparison

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:01:01Z
----------------------------------------------------------------

Line #25.    anim = animation.FuncAnimation(fig, animate, init_func = init,

if this is an animation with 2 plots, maybe plot them directly?


antoinebelloir commented on 2022-05-08T23:47:36Z
----------------------------------------------------------------

I see how using an animation for only two plots looks like an overkill, but I find it useful though to see the crosses (representing chromatin accessibility points mapped to gene expression domain) appear on top of the gene expression domain points (circles). I deleted the precedent visualisation that was a single plot of the two set of points (ie the second frame of the animation) as it was a duplicate

@review-notebook-app
Copy link

review-notebook-app bot commented May 5, 2022

View / edit / reply to this conversation on ReviewNB

marcocuturi commented on 2022-05-05T21:01:02Z
----------------------------------------------------------------

Line #26.                                   frames= [0,1],  # Ic, c'est la valeur de k

french comment


antoinebelloir commented on 2022-05-08T23:47:49Z
----------------------------------------------------------------

Deleted

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor Author

Changed to "Gromov-Wasserstein for multi-omics"


View entire conversation on ReviewNB

Copy link
Contributor Author

Agreed, the introduction paragraphs were too long. I changed it to a way shorter presentation, that sticks better with other notebooks


View entire conversation on ReviewNB

Copy link
Contributor Author

I agree that a comparison of OTT vs POT is useful. I added it to the notebook. You are right to point out the convergence settings : for the contest to be fair, I therefore used POT's default parameters for OTT (maximum number of iterations=1000 and threshold=1e-9) and added a comment to make this explicit. You will see that OTT runs way faster on GPU. I did not run comparison experiences on CPU.


View entire conversation on ReviewNB

Copy link
Contributor Author

Following your comment I used Black to format most of my cells. It's cleaner now


View entire conversation on ReviewNB

Copy link
Contributor Author

Done


View entire conversation on ReviewNB

Copy link
Contributor Author

Agreed, I made a modification within "fin_correspondances" 's body but I could to change "e" to "epsilon" in the arguments without making the code heavier, since "e" is actually a keyword argument used by another method of the SCOT class


View entire conversation on ReviewNB

Copy link
Contributor Author

See my reply about OTT vs POT comparison. I changed max_iterations to 1000 and I set threshold to 1e-9 to match the POT implementation.


View entire conversation on ReviewNB

Copy link
Contributor Author

Corrected


View entire conversation on ReviewNB

Copy link
Contributor Author

Also corrected


View entire conversation on ReviewNB

Copy link
Contributor Author

I rewrote this cell to make it clearer and to mention the OTT vs POT comparison


View entire conversation on ReviewNB

Copy link
Contributor Author

I see how using an animation for only two plots looks like an overkill, but I find it useful though to see the crosses (representing chromatin accessibility points mapped to gene expression domain) appear on top of the gene expression domain points (circles). I deleted the precedent visualisation that was a single plot of the two set of points (ie the second frame of the animation) as it was a duplicate


View entire conversation on ReviewNB

Copy link
Contributor Author

Deleted


View entire conversation on ReviewNB

@marcocuturi
Copy link
Contributor

This is starting to look good!

May I suggest that you also add the link to that notebook in

https://github.com/ott-jax/ott/blob/main/docs/index.rst

so that it is directly referenced in the doc?

I would probably add it in the "advanced applications" section

@antoinebelloir
Copy link
Contributor Author

Thanks ! I added it to the "advanced applications" list just after the "Single-cell genomics" notebook in order to group biology-related examples

@marcocuturi marcocuturi merged commit 627e0ba into ott-jax:main May 9, 2022
@marcocuturi
Copy link
Contributor

Thanks!

@antoinebelloir
Copy link
Contributor Author

It was very interesting to contribute to this notebook ! Just a detail, the merged notebook has a nice display in the repo pre-visualisation but has two little display issues on the readthedocs.io website :
image
image
I do not know how to fix them

@marcocuturi
Copy link
Contributor

Thanks Antoine! I think I fixed that. it seems you cannot put an url on top of a ... code formatting. You could fix it with a PR, but no worries, I took care of that. Thanks a lot again for this great contribution to the toolbox!!

michalk8 pushed a commit that referenced this pull request Jun 27, 2024
add first version of SCOT demonstration notebook (Gromov-Wasserstein for multi-omics)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants