-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding mia examples #50
base: main
Are you sure you want to change the base?
Adding mia examples #50
Conversation
…ed notebook even inside shiny proxy
… with deep-linked variables. style and docs updates.
…jupyter Fixes for running inside Shiny Proxy
…ve_metagenomics Comparative metagenomics
…BI-Metagenomics#10) * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
…-Metagenomics#9) * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Sandy Rogers <[email protected]>
* docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Sandy Rogers <[email protected]>
…ent (EBI-Metagenomics#17) * docs: update .all-contributorsrc [skip ci] * docs: update README.md [skip ci] Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: Sandy Rogers <[email protected]>
…BI-Metagenomics#18) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I do not have now time to review this in detail. In general, this looks good. However, some operations are done little bit too complicated. I commented at least some of them, check the whole code if there is room to simplify. As this is overview to MGnifyR and mia, we should make the examples rather simple. By that I mean that we should use methods found in SE ecosystem if possible as they make things easier.
About mia version issue. It would be best to have the most updated version of mia so that we do not have to update function names etc in the future. Of course that depends also on the server requirements and stuff like that.
You can find info on installing development version of mia from here https://microbiome.github.io/OMA/docs/devel/pages/06_packages.html#package-installation
The easiest is to install it from GitHub. (However, installing devel version from Bioconductor is usually preferred).
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
.../R Mia Examples/Fetch-Analyses-metadata-for-a-Study_files/libs/bootstrap/bootstrap-icons.css
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments for starters.
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok herewego! Some more suggestions. The essential thing is to add more support for using the (Tree)SE ecosystem tools
@antagomir I was wondering - the "Fetch analyses metadata for a study" file that I pulled from the other PR - should I just remove it completely? I think there was some confusion. It's basically a replica of this notebook - https://docs.mgnify.org/src/notebooks/R%20Examples/Fetch%20Analyses%20metadata%20for%20a%20Study.html But the only difference is the last section where we convert to mia vs phyloseq. But I don't really see it being needed, because we have to do that anyway in the comparative metagenomics file. I think Noah may have done the wrong notebook. Or maybe we're meant to do a mia version of both the comparative metagenomics guide and the fetching metadata guide? Let me know and then I'll make those grammar changes I used what he did for section 1 of this guide - https://docs.mgnify.org/src/notebooks/R%20Examples/Comparative%20Metagenomics.html - but maybe that was the part we needed not the fetch metadata guide |
Great thanks will do |
@TuomasBorman might add some comments but basically the notebook that you linked is based on the old version of MGnifyR. We recently rewrote the package to clean the code base and to add support for TreeSE/MAE. Some function names might have changed, and there may be some other issues (argument names, changes in output formats how they are being processed). Make sure that the new workflow is tested with the latest Bioconductor development version of MGnifyR. We support switching from phyloseq to TreeSE/MAE framework. Therefore it is justified to update (or rather, create a new version) of this notebook to show how to do this for TreeSE/MAE with the new upgraded MGnifyR package. Imo. it is useful to have the two notebooks: this one that shows how to fetch data with MGnifyR and just get started with TreeSE/MAE (instead of phyloseq). And then the other one that focuses on the actual downstream analyses. However if it starts to feel that there is too much overlap and these are better merged, we could think about that. |
@antagomir - I've just pushed a full updated - should be very close to finished now, it's quite polished. I'm sure there'll be a few changes though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! Just some remarks and suggestions still.
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
# Prepare the cross-validation data frame. | ||
# Transpose the assay matrix to have samples as rows and features (taxa) as columns | ||
assay <- t(assay) | ||
|
||
# Convert the transposed matrix to a data frame for easier manipulation and analysis | ||
df <- as.data.frame(assay) | ||
|
||
# Extract the geographic location labels from the colData of the tse object | ||
labels <- colData(filtered_tse)$sample_geographic.location..country.and.or.sea.region. | ||
|
||
# Simplify the labels to make them valid R variable names | ||
labels <- gsub(" ", "_", labels) | ||
labels <- gsub("[:.]", "", labels) | ||
|
||
# Convert the geographic location labels to a factor (categorical variable) | ||
labels <- as.factor(labels) | ||
|
||
# Add the geographic location labels as a new column in the data frame | ||
df$geo_location <- labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use mia::meltSE() for the same outcome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meltSE puts data into long format. If I read this correctly, the data here is in wide format (there are as many columns as there are features)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meltSE would be more standard for this given purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I definitely use meltSE and convert to long format? I think it adds a few extra steps - the original may be easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that it would help to make this shorter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't mikropml take wide format data though?
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.qmd
Outdated
Show resolved
Hide resolved
# Prepare the cross-validation data frame. | ||
# Transpose the assay matrix to have samples as rows and features (taxa) as columns | ||
assay <- t(assay) | ||
|
||
# Convert the transposed matrix to a data frame for easier manipulation and analysis | ||
df <- as.data.frame(assay) | ||
|
||
# Extract the geographic location labels from the colData of the tse object | ||
labels <- colData(filtered_tse)$sample_geographic.location..country.and.or.sea.region. | ||
|
||
# Simplify the labels to make them valid R variable names | ||
labels <- gsub(" ", "_", labels) | ||
labels <- gsub("[:.]", "", labels) | ||
|
||
# Convert the geographic location labels to a factor (categorical variable) | ||
labels <- as.factor(labels) | ||
|
||
# Add the geographic location labels as a new column in the data frame | ||
df$geo_location <- labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meltSE puts data into long format. If I read this correctly, the data here is in wide format (there are as many columns as there are features)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good! some remarks..
src/notebooks/R Mia Examples/Comparative-Metagenomics.rmarkdown
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.rmarkdown
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.rmarkdown
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.rmarkdown
Outdated
Show resolved
Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.rmarkdown
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just change from qmd to ipynb as discussed in #49
notebooks.Rproj
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be removed
So this is the newest EBI notebooks section. There were the following issues/questions that came up -
The newer getTop() mia function wasn't working. So I had to use the GetTopFeatures() method. I'm on version 1.12.0 which I believe is the latest one
Similarly, the newer rarefyAssay() function wasn't found, so I used subsampleCounts() instead.
In the section titled "Comparative metagenomics at community level: Beta diversity" you need a mountford distance matrix for the adonis2 and betadisper methods. And when using the runMDS() method, you can't then convert the matrix that method produces to a dist object as it throws an error that it's not a square matrix. Or at least I wasn't sure how to do this. So I called the vegan package directly.
By the way, this PR includes the changes that Noah had made, I corrected one thing on one of his files, but other than that assumed his PR was correct.
@TuomasBorman @antagomir