Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding mia examples #50

Open
wants to merge 84 commits into
base: main
Choose a base branch
from

Conversation

SHillman836
Copy link

@SHillman836 SHillman836 commented Jul 15, 2024

So this is the newest EBI notebooks section. There were the following issues/questions that came up -

  1. The newer getTop() mia function wasn't working. So I had to use the GetTopFeatures() method. I'm on version 1.12.0 which I believe is the latest one

  2. Similarly, the newer rarefyAssay() function wasn't found, so I used subsampleCounts() instead.

  3. In the section titled "Comparative metagenomics at community level: Beta diversity" you need a mountford distance matrix for the adonis2 and betadisper methods. And when using the runMDS() method, you can't then convert the matrix that method produces to a dist object as it throws an error that it's not a square matrix. Or at least I wasn't sure how to do this. So I called the vegan package directly.

By the way, this PR includes the changes that Noah had made, I corrected one thing on one of his files, but other than that assumed his PR was correct.

@TuomasBorman @antagomir

SandyRogers and others added 30 commits February 7, 2022 17:32
… with deep-linked variables. style and docs updates.
…jupyter

Fixes for running inside Shiny Proxy
…ve_metagenomics

Comparative metagenomics
…BI-Metagenomics#10)

* docs: update README.md [skip ci]

* docs: create .all-contributorsrc [skip ci]

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
…-Metagenomics#9)

* docs: update README.md [skip ci]

* docs: create .all-contributorsrc [skip ci]

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Sandy Rogers <[email protected]>
* docs: update README.md [skip ci]

* docs: update .all-contributorsrc [skip ci]

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* docs: update README.md [skip ci]

* docs: update .all-contributorsrc [skip ci]

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Sandy Rogers <[email protected]>
…ent (EBI-Metagenomics#17)

* docs: update .all-contributorsrc [skip ci]

* docs: update README.md [skip ci]

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Sandy Rogers <[email protected]>
…BI-Metagenomics#18)

* docs: update README.md [skip ci]

* docs: update .all-contributorsrc [skip ci]

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Copy link

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I do not have now time to review this in detail. In general, this looks good. However, some operations are done little bit too complicated. I commented at least some of them, check the whole code if there is room to simplify. As this is overview to MGnifyR and mia, we should make the examples rather simple. By that I mean that we should use methods found in SE ecosystem if possible as they make things easier.

About mia version issue. It would be best to have the most updated version of mia so that we do not have to update function names etc in the future. Of course that depends also on the server requirements and stuff like that.

You can find info on installing development version of mia from here https://microbiome.github.io/OMA/docs/devel/pages/06_packages.html#package-installation

The easiest is to install it from GitHub. (However, installing devel version from Bioconductor is usually preferred).

Copy link

@antagomir antagomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments for starters.

@antagomir
Copy link

How does this PR relate to #49? Can we close #49 if it has been merged with this?

@SHillman836
Copy link
Author

SHillman836 commented Jul 16, 2024

How does this PR relate to #49? Can we close #49 if it has been merged with this?

Sorry didn't mean to close this have just reoopened it now. Yes we can close the other PR

@SHillman836 SHillman836 reopened this Jul 16, 2024
Copy link

@antagomir antagomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok herewego! Some more suggestions. The essential thing is to add more support for using the (Tree)SE ecosystem tools

src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
@SHillman836
Copy link
Author

SHillman836 commented Jul 16, 2024

Some comments for starters.

@antagomir I was wondering - the "Fetch analyses metadata for a study" file that I pulled from the other PR - should I just remove it completely?

I think there was some confusion. It's basically a replica of this notebook - https://docs.mgnify.org/src/notebooks/R%20Examples/Fetch%20Analyses%20metadata%20for%20a%20Study.html

But the only difference is the last section where we convert to mia vs phyloseq. But I don't really see it being needed, because we have to do that anyway in the comparative metagenomics file.

I think Noah may have done the wrong notebook. Or maybe we're meant to do a mia version of both the comparative metagenomics guide and the fetching metadata guide? Let me know and then I'll make those grammar changes

I used what he did for section 1 of this guide - https://docs.mgnify.org/src/notebooks/R%20Examples/Comparative%20Metagenomics.html - but maybe that was the part we needed not the fetch metadata guide

@SHillman836
Copy link
Author

Ok herewego! Some more suggestions. The essential thing is to add more support for using the (Tree)SE ecosystem tools

Great thanks will do

@antagomir
Copy link

@TuomasBorman might add some comments but basically the notebook that you linked is based on the old version of MGnifyR. We recently rewrote the package to clean the code base and to add support for TreeSE/MAE.

Some function names might have changed, and there may be some other issues (argument names, changes in output formats how they are being processed). Make sure that the new workflow is tested with the latest Bioconductor development version of MGnifyR.

We support switching from phyloseq to TreeSE/MAE framework. Therefore it is justified to update (or rather, create a new version) of this notebook to show how to do this for TreeSE/MAE with the new upgraded MGnifyR package.

Imo. it is useful to have the two notebooks: this one that shows how to fetch data with MGnifyR and just get started with TreeSE/MAE (instead of phyloseq). And then the other one that focuses on the actual downstream analyses.

However if it starts to feel that there is too much overlap and these are better merged, we could think about that.

@SHillman836
Copy link
Author

@antagomir - I've just pushed a full updated - should be very close to finished now, it's quite polished. I'm sure there'll be a few changes though.

Copy link

@antagomir antagomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! Just some remarks and suggestions still.

src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
Comment on lines 561 to 579
# Prepare the cross-validation data frame.
# Transpose the assay matrix to have samples as rows and features (taxa) as columns
assay <- t(assay)

# Convert the transposed matrix to a data frame for easier manipulation and analysis
df <- as.data.frame(assay)

# Extract the geographic location labels from the colData of the tse object
labels <- colData(filtered_tse)$sample_geographic.location..country.and.or.sea.region.

# Simplify the labels to make them valid R variable names
labels <- gsub(" ", "_", labels)
labels <- gsub("[:.]", "", labels)

# Convert the geographic location labels to a factor (categorical variable)
labels <- as.factor(labels)

# Add the geographic location labels as a new column in the data frame
df$geo_location <- labels

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use mia::meltSE() for the same outcome?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meltSE puts data into long format. If I read this correctly, the data here is in wide format (there are as many columns as there are features)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meltSE would be more standard for this given purpose?

Copy link
Author

@SHillman836 SHillman836 Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I definitely use meltSE and convert to long format? I think it adds a few extra steps - the original may be easier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that it would help to make this shorter?

Copy link
Author

@SHillman836 SHillman836 Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't mikropml take wide format data though?

Copy link

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

notebooks.Rproj Outdated Show resolved Hide resolved
dependencies/conda Outdated Show resolved Hide resolved
.Rprofile Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/_resources/mgnifyr_help.md Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
Comment on lines 561 to 579
# Prepare the cross-validation data frame.
# Transpose the assay matrix to have samples as rows and features (taxa) as columns
assay <- t(assay)

# Convert the transposed matrix to a data frame for easier manipulation and analysis
df <- as.data.frame(assay)

# Extract the geographic location labels from the colData of the tse object
labels <- colData(filtered_tse)$sample_geographic.location..country.and.or.sea.region.

# Simplify the labels to make them valid R variable names
labels <- gsub(" ", "_", labels)
labels <- gsub("[:.]", "", labels)

# Convert the geographic location labels to a factor (categorical variable)
labels <- as.factor(labels)

# Add the geographic location labels as a new column in the data frame
df$geo_location <- labels

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meltSE puts data into long format. If I read this correctly, the data here is in wide format (there are as many columns as there are features)

Copy link

@antagomir antagomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good! some remarks..

src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
src/notebooks/R Mia Examples/Comparative-Metagenomics.qmd Outdated Show resolved Hide resolved
Copy link

@antagomir antagomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change from qmd to ipynb as discussed in #49

notebooks.Rproj Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants