Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding mia examples #50

Open
wants to merge 84 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
bec04a9
Initial commit
SandyRogers Feb 7, 2022
d8bced8
adds docker setups for local and shinyproxy; first notebooks
SandyRogers Feb 7, 2022
a757cc1
updates container config for quay.io
SandyRogers Feb 8, 2022
2c8c1c7
updates R notebooks: cheat sheet; output removal; cross-study taxonom…
SandyRogers Feb 8, 2022
af7a2dc
use upstream jupyter/datascience-notebook layer instead of shiny-proxy's
SandyRogers Feb 8, 2022
604da99
pins some dependencies for a more reproducible build
SandyRogers Feb 9, 2022
3332498
adds a custom jupyter lab extension to redirect jupyterlab to specifi…
SandyRogers Feb 9, 2022
634b9d9
adds support for setting ENV VARs via query params. updates notebooks…
SandyRogers Feb 10, 2022
fcbbe13
Merge pull request #1 from EBI-Metagenomics/upstream-jupyter
SandyRogers Feb 10, 2022
5ce5715
cleanup of jl extension: subsume license and remove GHA
SandyRogers Feb 11, 2022
1bd99fb
Adds integration tests (#2)
SandyRogers Feb 11, 2022
0f7e0ac
adds integration status badge
SandyRogers Feb 11, 2022
cfbdd60
bioconda SIAMCAT install
Ales-ibt Jul 8, 2022
91d6090
Update environment.yml
Ales-ibt Jul 8, 2022
f02a5e4
Install metagenomeseq
Ales-ibt Jul 12, 2022
ff0bca7
Merge pull request #4 from EBI-Metagenomics/comparative_metagenomics
Ales-ibt Aug 2, 2022
b7cd231
Comparative metagenomics (#5)
Ales-ibt Aug 19, 2022
98aa836
docs: add SandyRogers as a contributor for code, example, and 3 more …
allcontributors[bot] Nov 9, 2022
1b2a062
docs: add Ales-ibt as a contributor for code, example, and ideas (#9)
allcontributors[bot] Nov 9, 2022
2f61427
Comparative metagenomics siamcat (#6)
Ales-ibt Nov 9, 2022
b4e7686
adds jupyter-lab extension with MGnify help (#12)
SandyRogers Nov 10, 2022
ad64119
updates comparative metagenomics notebook for lib upgrades
SandyRogers Nov 11, 2022
6f9c305
docs: add bebatut as a contributor for infra (#15)
allcontributors[bot] Nov 13, 2022
7d1b4ee
docs: add bgruening as a contributor for infra (#16)
allcontributors[bot] Nov 13, 2022
f4b8887
docs: add vestalisvirginis as a contributor for ideas, code, and cont…
allcontributors[bot] Nov 13, 2022
dea36fc
fixes all-contributors config
SandyRogers Nov 13, 2022
7dd1c3b
docs: add mberacochea as a contributor for ideas, code, and 2 more (#18)
allcontributors[bot] Nov 13, 2022
b89fe4a
rationalizing docker images and speeding up cache population
SandyRogers Nov 13, 2022
c93867b
updates shinyproxy on GHA tests
SandyRogers Nov 14, 2022
7301308
fixes shinyproxy version in tests config
SandyRogers Nov 14, 2022
9c9c1b6
(re)adds notebooks to docker image during build
SandyRogers Nov 14, 2022
4ddef76
Static (preview) rendering (#19)
SandyRogers Nov 16, 2022
ba2dba2
Update issue templates
SandyRogers Nov 18, 2022
74e2e1b
Siamcat2 interpretation plot (#20)
Ales-ibt Dec 8, 2022
47af244
adds info about deployment
SandyRogers Dec 9, 2022
d014d08
Adds static documentation (docs.mgnify.org) (#22)
SandyRogers Jan 31, 2023
88d0972
fixes case sensitive glossary links
SandyRogers Feb 7, 2023
5809641
Separating python and r kernels into their own conda envs (#23)
SandyRogers Mar 9, 2023
af478d4
simplifies mgnify_query notebook for faster rendering
SandyRogers Mar 10, 2023
691d0a8
Biohackaton2022 genomes nb (#11)
vestalisvirginis Jul 28, 2023
d809398
Added GSC workshop files
tgurbich Aug 3, 2023
3e8525f
Fixes
tgurbich Aug 3, 2023
d1fe1e1
Cleaned execution printouts
tgurbich Aug 3, 2023
ad0bc28
Implemented suggestions from review
tgurbich Aug 3, 2023
fce5d7c
Merge pull request #28 from EBI-Metagenomics/gsc_workshop
tgurbich Aug 3, 2023
b2869cb
Corrected typos
tgurbich Aug 4, 2023
2bd0b89
Merge pull request #29 from EBI-Metagenomics/gsc_corrections
tgurbich Aug 4, 2023
980ea17
docs: add tgurbich as a contributor for ideas, code, and content (#31)
allcontributors[bot] Aug 4, 2023
5c43d23
Pathways vis (#26)
Ales-ibt Aug 31, 2023
72c34b0
docs: add amartyanambiar as a contributor for code, example, and idea…
allcontributors[bot] Aug 31, 2023
c1c9bc0
updates mgnifyr-cache compress
SandyRogers Aug 31, 2023
51c9fcc
Multi stage build (#33)
SandyRogers Sep 11, 2023
6ccdd14
static render fixes and cleanup
SandyRogers Sep 12, 2023
f837385
Merge remote-tracking branch 'origin/main'
SandyRogers Sep 12, 2023
64f8d83
AtlantECO notebook (#35)
KateSakharova Sep 25, 2023
3a3cfdd
docs: add KateSakharova as a contributor for ideas, code, and content…
allcontributors[bot] Sep 25, 2023
66d2608
Push built containers to registry (#38)
SandyRogers Sep 26, 2023
f691ab1
free up disk space during preview build to make space for docker img
SandyRogers Sep 26, 2023
0c69fcc
Updated submission link to reflect redirection of submit page, to the…
MGS-sails Sep 29, 2023
f099158
Added text update suggestion from Lorna
MGS-sails Oct 2, 2023
27f1de6
Merge pull request #40 from EBI-Metagenomics/data-flow-updates
MGS-sails Oct 2, 2023
9934d5b
adds details of sourmash command and parameters used by MGnify, to docs
SandyRogers Oct 18, 2023
0449b05
Adds documentation page about "additional analyses" (RO-Crates) (#41)
SandyRogers Nov 9, 2023
f821261
Update MGnifyR repo (#42)
SandyRogers Nov 9, 2023
0114f51
Update genome-viewer.md
tgurbich Feb 7, 2024
088873e
Update src/docs/genome-viewer.md
tgurbich Feb 7, 2024
6fbc862
Merge pull request #44 from EBI-Metagenomics/genome-viewer-update
tgurbich Feb 7, 2024
90b8256
Small bug fixed on Pathways Vis notebook (#43)
Ales-ibt Feb 15, 2024
cca64d9
Multiomics docu (#45)
Ales-ibt Mar 8, 2024
639f06c
do not render atlanteco notebook into docs
SandyRogers Mar 8, 2024
953b077
extra try to not render atlanteco notebook
SandyRogers Mar 8, 2024
9ae8aec
fix "Search for Samples or Studies" R notebook: sparse df merge
SandyRogers Mar 12, 2024
e2dfe77
Fix/update atlanteco (#47)
KateSakharova Apr 11, 2024
a37a29b
Started adding in the comparative metagenomics in mia code
SHillman836 Jul 10, 2024
2515770
quick gitignore change
SHillman836 Jul 10, 2024
6dbcc07
added .Rdata to gitignore
SHillman836 Jul 10, 2024
cf0c292
finished part 2
SHillman836 Jul 10, 2024
479e65c
finished notebook draft
SHillman836 Jul 12, 2024
ef696f3
Merge remote-tracking branch 'upstream/main' into adding-mia-examples
SHillman836 Jul 15, 2024
7a5b6db
updated changes
SHillman836 Jul 19, 2024
806fa4d
updated changes
SHillman836 Jul 24, 2024
88d8d75
updated changes
SHillman836 Jul 25, 2024
0dca210
changed file format
SHillman836 Jul 26, 2024
994e0f1
Delete notebooks.Rproj
TuomasBorman Sep 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,16 @@ src/docs/*.html
src/*.html
src/notebooks/**/*.html
src/*-listing.json
renv/
renv.lock
.Rhistory

*.parquet
!**/example-data/**/*.parquet
*.sig
ko*.pathview.png
src/notebooks/R Examples/*.tsv
src/notebooks/R Examples/*.txt
src/notebooks/R Examples/*.txt
.Rproj.user
.RData
.Rprofile
902 changes: 902 additions & 0 deletions src/notebooks/R Mia Examples/Comparative-Metagenomics.ipynb

Large diffs are not rendered by default.

143 changes: 143 additions & 0 deletions src/notebooks/R Mia Examples/Fetch-Analyses-metadata-for-a-Study.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"title: \"Fetch analyses metadata for a study\"\n",
"author:\n",
" - name: Noah de Gunst\n",
" affiliation:\n",
" - id: mia\n",
" name: Department of Computing, University of Turku, Finland\n",
" - name: Sam Hillman\n",
" affiliation:\n",
" - id: mia\n",
" name: Department of Computing, University of Turku, Finland\n",
"categories: [R]\n",
"execute: \n",
" eval: true\n",
"---\n",
"\n",
"::: {style=\"max-width:1200px\"}\n",
"![](./../_resources/mgnify_logo.png)\n",
"\n",
"# Fetch a study using MGnifyR; download the metadata for all of its analyses\n",
"\n",
"The [MGnify API](https://www.ebi.ac.uk/metagenomics/api/v1) returns data and \n",
"relationships as JSON. [MGnifyR](https://www.bioconductor.org/packages/release/bioc/html/MGnifyR.html) \n",
"is a package to help you read MGnify data into your R analyses.\n",
"\n",
"You can find all of the other \"API endpoints\" using the [Browsable API interface in your web browser](https://www.ebi.ac.uk/metagenomics/api/v1).\n",
"\n",
"This is an interactive code notebook (a Jupyter Notebook). To run this code, click \n",
"into each cell and press the ▶ button in the top toolbar, or press `shift+enter`.\n",
"\n",
"------------------------------------------------------------------------\n",
":::\n",
"\n",
"#### Setting the access code\n",
"First, we need to specify the accession number of the study we're working with. \n",
"This can be done by setting the `mgnify_study_accession` variable. The accession \n",
"number uniquely identifies the study in the MGnify database.\n",
"\n",
"```{r}\n",
"#| output: false\n",
"\n",
"source(\"./utils/variable_utils.r\")\n",
"\n",
"mgnify_study_accession <- get_variable_from_link_or_input('MGYS', 'Study Accession', \n",
" 'MGYS00005116')\n",
"\n",
"# You can also just directly set the accession variable in code, like this:\n",
"# mgnify_study_accession <- \"MGYS00005292\"\n",
"```\n",
"\n",
"#### Constructing a MgnifyClient object to access the database\n",
"To interact with the MGnify database, we need to create an MgnifyClient object. \n",
"This object allows us to fetch data from MGnify, and we can configure it to use \n",
"a cache for efficiency. \n",
"\n",
"```{r}\n",
"#| output: false\n",
"# Importing the libraries\n",
"library(vegan)\n",
"library(ggplot2)\n",
"library(mia)\n",
"library(MGnifyR)\n",
"\n",
"# Check if the cache directory exists, if not, create it\n",
"if (!dir.exists(\"./.mgnify_cache\")) {\n",
" dir.create(\"./.mgnify_cache\", recursive = TRUE)\n",
"}\n",
"\n",
"# Create the MgnifyClient object with caching enabled\n",
"mg <- MgnifyClient(usecache = TRUE, cache_dir = \"./.mgnify_cache\")\n",
"```\n",
"\n",
"#### Displaying the help file\n",
"\n",
"```{r}\n",
"#| output: false\n",
"library(IRdisplay)\n",
"display_markdown(file = '../_resources/mgnifyr_help.md')\n",
"```\n",
"\n",
"## Fetch a list of the Analyses for the Study\n",
"Using the MgnifyClient object, we can search for all analyses associated with the \n",
"study accession number we set earlier. This will return a list of analysis accession \n",
"numbers.\n",
"\n",
"```{r}\n",
"#| output: false\n",
"analyses_accessions <- searchAnalysis(mg, \"studies\", mgnify_study_accession)\n",
"analyses_accessions\n",
"```\n",
"\n",
"## Download metadata for the first 10 Analyses\n",
"...and put it into a dataframe.\n",
"\n",
"```{r}\n",
"#| output: false\n",
"analyses_metadata_df <- getMetadata(mg, head(analyses_accessions, 10))\n",
"```\n",
"\n",
"## Display metadata\n",
"The table could be big, so let's look at a sample of it (`head`).\n",
"\n",
"```{r}\n",
"#| output: false\n",
"t(head(analyses_metadata_df))\n",
"```\n",
"\n",
"## Download the data to a multi-assay data object\n",
"\n",
"> [mia](https://microbiome.github.io/mia/) is a Bioconductor package designed to \n",
"import, store and analyze microbiome data using an object called a `TreeSummarizedExperiment.` \n",
"This is a tailored data container optimized for microbiome data analysis.Being \n",
"built on the `SummarizedExperiment` class, miaverse seamlessly integrates into the \n",
"extensive `SummarizedExperiment` ecosystem. In this example we download the MGnifyR \n",
"data to an MAE, which contains multiple `TreeSummarizedExperiment` objects.\n",
"\n",
"\n",
"```{r}\n",
"#| output: false\n",
"mae <- getResult(mg, accession = analyses_accessions)\n",
"```\n",
"\n",
"You use `MGnifyR` features further, for example to download data. Check the Cheat \n",
"Sheet at the top for more."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions src/notebooks/R Mia Examples/_resources/mgnifyr_help.md
SHillman836 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Help with MGnifyR

MGnifyR is an R package that provides a convenient way for R users to access data from [the MGnify API](https://www.ebi.ac.uk/metagenomics/api/).

Detailed help for each function is available in R using the standard `?function_name` command.

A vignette is available containing a reasonably verbose overview of the main functionality.
This can be read either within R with the `vignette("MGnifyR")` command, or [on the bioconductor vignette website](https://www.bioconductor.org/packages/release/bioc/vignettes/MGnifyR/inst/doc/MGnifyR.html)

## MGnifyR Command cheat sheet

For a full list of key MGnifyR functions, please look at the [MGnifyR website](https://ebi-metagenomics.github.io/MGnifyR/reference/index.html).
60 changes: 60 additions & 0 deletions src/notebooks/R Mia Examples/utils/kegg_pathways_utils.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Retrieve name and URL for a specific pathway in the KEGG database
get_pathway_info <- function(pathway) {
pathway <- paste("map", pathway, sep = "")
pathway_name <- keggList(pathway)[[1]]
pathway_url <- paste("https://www.kegg.jp/pathway/", pathway, sep = "")
return(list(pathway_name = pathway_name, pathway_url = pathway_url))
}


# Function to prompt users for pathway selection and return custom pathway IDs
PathwaysSelection <- function() {
display_markdown("#### Pathways Selection :\n\n
- For the most general & most complete pathways, input 'G'\n\n
- Press Enter to generate the most complete pathways\n\n
- To add custom pathways, input pathway numbers (ex: 00053,01220)")

flush.console()
CUSTOM_PATHWAY_IDS <- get_variable_from_link_or_input('CUSTOM_PATHWAY_IDS', name = 'Pathways Accession', default = '')

if (CUSTOM_PATHWAY_IDS == "") {
CUSTOM_PATHWAY_IDS <- list()
} else if (CUSTOM_PATHWAY_IDS == "G") {
CUSTOM_PATHWAY_IDS <- list("00010", "00020", "00030", "00061", "01232","00240", "00190")
} else {
CUSTOM_PATHWAY_IDS <- strsplit(CUSTOM_PATHWAY_IDS, ",")[[1]]
}

message(if (length(CUSTOM_PATHWAY_IDS) > 0) {
paste("\nUsing", CUSTOM_PATHWAY_IDS, " - ", sapply(CUSTOM_PATHWAY_IDS, function(id) paste(get_pathway_info(id)[1]," : ",get_pathway_info(id)[2])), "as a Custom Pathway")
} else {
"\nUsing NONE as a Custom Pathway"
})
return(CUSTOM_PATHWAY_IDS)
}


# Clearing the current working directory and displaying generated figures from `pathway_plots/` directory
generatePathwayPlots <- function() {
# Clearing the current working directory
if (!dir.exists("pathway_plots")) {
dir.create("pathway_plots")
}

file.copy(from = list.files(pattern = "./*pathview.png"), to = "./pathway_plots/", overwrite = TRUE)

png_files <- list.files(path = ".", pattern = "*.png")
xml_files <- list.files(path = ".", pattern = "*.xml")
files <- c(png_files, xml_files)
output <- capture.output({
unlink(files)
})

# Accessing the png files and displaying it
images <- list.files("pathway_plots", full.names = TRUE)

for (pathway in images) {
display_markdown(get_pathway_info(gsub("[^0-9]", "", basename(pathway)))$pathway_name)
display_png(file = pathway)
}
}
16 changes: 16 additions & 0 deletions src/notebooks/R Mia Examples/utils/variable_utils.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
library(glue)

get_variable_from_link_or_input <- function(variable, name = 'accession', default = NA) {
# Get a variable value, either from an ENV VAR that would have been set by the jlab_query_params extension, or through direct user input.
var <- Sys.getenv(variable, unset = NA)
if (!is.na(var)) {
print(glue('Using {name} = {var} from the link you followed.'))
} else {
determiner <- ifelse(grepl(tolower(substr(name, 0, 1)), 'aeiou'), 'an', 'a')
var <- readline(prompt = glue("Type {determiner} {name} [default: {default}]"))
}
var <- ifelse(is.na(var) || var == '', default, var)
print(glue('Using "{var}" as {name}'))
var
}

Loading