-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
40 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,36 @@ | ||
# Downloading Next Generation Sequencing Reads from SRA | ||
|
||
# Downloading Microbiome NGS Reads from SRA | ||
|
||
---In Progress--- | ||
|
||
<br> | ||
|
||
# Related Works | ||
## IMAP GitHub Repos | ||
|
||
| Repo | Description | Status | | ||
|:----------------|:---------------------|--------------:| | ||
| [IMAP-OVERVIEW](https://github.com/datainsights/imap-project-overview/) | IMAP project overview | [In-progress](https://datainsights.github.io/imap-project-overview/) | | ||
| [IMAP-PART 01](https://github.com/tmbuza/imap-essential-software/) | Software requirement for microbiome data analysis with Snakemake workflows | [In-progress](https://tmbuza.github.io/imap-essential-software/) | | ||
| [IMAP-PART 02](https://github.com/tmbuza/imap-sample-metadata/) | Downloading and exploring microbiome sample metadata from SRA Database | [In-progress](https://tmbuza.github.io/imap-sample-metadata/) | | ||
| [IMAP-PART 03](https://github.com/tmbuza/imap-download-sra-reads/) | Downloading and filtering microbiome sequencing data from SRA database | [In-progress](https://tmbuza.github.io/imap-download-sra-reads/) | | ||
| [IMAP-PART 04](https://github.com/tmbuza/imap-read-quality-control/) | Quality Control of Microbiome Next Generation Sequencing Reads | [In-progress](https://tmbuza.github.io/imap-read-quality-control/) | | ||
| [IMAP-PART 05](https://github.com/tmbuza/imap-mothur-bioinformatics/) | Microbial profiling using MOTHUR and Snakemake workflows | [In-progress](https://tmbuza.github.io/imap-mothur-bioinformatics/) | | ||
| [IMAP-PART 06](https://github.com/tmbuza/imap-qiime2-bioinformatics/) | Microbial profiling using QIIME2 and Snakemake workflows | [In-progress](https://tmbuza.github.io/imap-qiime2-bioinformatics/) | | ||
| [IMAP-PART 07](https://github.com/tmbuza/imap-data-preparation/) | Processing Output from 16S-Based microbiome bioinformatics pipelines| [In-progress](https://tmbuza.github.io/imap-data-preparation/) | | ||
| [IMAP-PART 08](https://github.com/tmbuza/imap-data-exploration/) | Exploratory Analysis of 16S-Based Microbiome Processed Data | [In-progress](https://tmbuza.github.io/imap-data-exploration/) | | ||
| [IMAP-ML](https://github.com/tmbuza/imap-machine-learning/) | Predictive Modeling of Microbiome Data | [In-progress](https://tmbuza.github.io/imap-machine-learning/) | | ||
| [IMAP-SUMMARY](https://github.com/tmbuza/imap-snakemake-workflows/) | Summary of snakemake workflows for microbiome data analysis | [In-progress](https://tmbuza.github.io/imap-snakemake-workflows/) | | ||
|
||
|
||
## Session information | ||
|
||
For a detailed overview of the tools and versions suitable for this guide, explore the [session information](session_info.txt). | ||
|
||
## Citation | ||
> Please consider citing the [iMAP article](https://rdcu.be/b5iVj) if you find any part of the iMAP practical user guides helpful in your microbiome data analysis. | ||
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C., Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4 | ||
|
||
## :tada: Raise awareness | ||
> Please help increase awareness of freely available tools for microbiome data analysis. | ||
See [Dimensions of the iMAP article](https://badge.dimensions.ai/details/id/pub.1117740326) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
[{"path":"index.html","id":"download-sra","chapter":"IMAP-Part 03: Downloading Microbiome Fastq Sequences from NCBI Database","heading":"IMAP-Part 03: Downloading Microbiome Fastq Sequences from NCBI Database","text":"Welcome Part 3 IMAP Practical User Guides!Explore intricacies microbiome bioinformatics present hands-demonstration “Downloading Microbiome Fastq Sequences SRA Database.” chapter designed guide seasoned researchers newcomers critical process accessing acquiring raw sequencing data Sequence Read Archive (SRA). Join us practical journey unlock potential high-throughput sequencing data, preparing robust microbiome analyses using IMAP platform.","code":""},{"path":"sra-fastq-sequences.html","id":"sra-fastq-sequences","chapter":"1 Retrieving Microbiome Fastq Sequences from the SRA Database","heading":"1 Retrieving Microbiome Fastq Sequences from the SRA Database","text":"Welcome IMAP Practical User Guides, Part 03. segment, offer hands-demonstration focused proficient acquisition “Microbiome Fastq Sequences Sequence Read Archive (SRA) Database.” delve chapter, presumed harbor sincere interest mastering intricacies downloading fastq sequences SRA database—pivotal centralized repository housing short reads Next-Generation Sequencing (NGS) platforms.Acknowledging microbiome read sequencing data can procured diverse sources, including direct acquisition sequencing platforms researchers, downloads Sequence Read Archive (SRA) European Nucleotide Archive (ENA), synthesis sequencing simulators, emphasis tutorial standardized approach provided SRA. approach ensures consistent reporting associated metadata, expounded upon IMAP Part 02 section.Throughout demonstrative session, objective systematically guide process retrieving microbiome sequencing data SRA, promoting uniform methodology. adhering standardized approach, aim enhance consistency subsequent bioinformatics analyses.","code":""},{"path":"installing-sra-toolkit.html","id":"installing-sra-toolkit","chapter":"2 Installing SRA Toolkit","heading":"2 Installing SRA Toolkit","text":"Navigate want install tools, preferably home directory.information click .Demo MAC OS","code":"curl -LO https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.0.0/sratoolkit.3.0.0-mac64.tar.gz\ntar -xf sratoolkit.3.0.0-mac64.tar.gz\nexport PATH=$HOME/sratoolkit.3.0.0-mac64/bin/:$PATH"},{"path":"installing-sra-toolkit.html","id":"create-a-cache-root-directory","chapter":"2 Installing SRA Toolkit","heading":"2.1 Create a cache root directory","text":"","code":"mkdir -p ~/ncbi\necho '/repository/user/main/public/root = \"cache_directory\"' > ~/ncbi/user-settings.mkfg"},{"path":"installing-sra-toolkit.html","id":"confirm-sra-toolkit-configuration","chapter":"2 Installing SRA Toolkit","heading":"2.2 Confirm sra toolkit configuration","text":"vdb-config -command display blue colored dialog.Use tab click c navigate cache tab.Review configuration save s exit x.information click .","code":"vdb-config -i"},{"path":"installing-sra-toolkit.html","id":"alternative-method","chapter":"2 Installing SRA Toolkit","heading":"2.3 Alternative method","text":"can create environment install essential tools . Example, sradb using environment.yml.","code":"name: sradb\nchannels:\n - conda-forge\n - bioconda\ndependencies:\n - sra-tools\n - entrez-direct\n - pysradbmamba create -c bioconda -c conda-forge sradb -file environment.yml"},{"path":"downloading-multiple-fastq-files.html","id":"downloading-multiple-fastq-files","chapter":"3 Downloading multiple fastq files","heading":"3 Downloading multiple fastq files","text":"Make sure fasterq-dump path.Type fasterq-dump fasterq-dump --help confirm.Must specify output temporary files.possible specifies range SRA accessions loop.Example code downloading reads NCBI-SRA accessions ranging SRR7450706 SRR7450761.","code":"for (( i = 706; i <= 761; i++ ))\n do\n time fasterq-dump SRR7450$i \\\n --split-3 \\\n --force \\\n --skip-technical \\\n --outdir data/reads \\\n --temp data/temp \\\n --threads 4 \n done\n\n<!--chapter:end:02-download-sequences.Rmd-->\n\n# Resizing Fastq files\n - Sometimes we want to extract a small subset to test the bioinformatics pipeline.\n - You can resize the fastq files using the `seqkit sample` function [@seqkit2022].\n\n\nExample extracting only 1% of the paired-end metagenomics sequencing data.\n\n> This bash script extracts 1% of the reads from only two sample (SRR10245277 to SRR10245280)\n\n```bash\nmkdir -p data\nfor i in {77..80}\n do\n cat SRR102452$i\\_1.fastq \\\n | seqkit sample -p 0.01 \\\n | seqkit shuffle -o data/SRR102452$i\\_1_sub.fastq \\\n | cat SRR102452$i\\_2.fastq \\\n | seqkit sample -p 0.01 \\\n | seqkit shuffle -o data/SRR102452$i\\_2_sub.fastq\n done"},{"path":"imap-github-repos.html","id":"imap-github-repos","chapter":"A IMAP GitHub Repos","heading":"A IMAP GitHub Repos","text":"","code":""},{"path":"snakemake-rule-graph-imap-part-03.html","id":"snakemake-rule-graph-imap-part-03","chapter":"B Snakemake Rule Graph: IMAP-PART 03","heading":"B Snakemake Rule Graph: IMAP-PART 03","text":"","code":""},{"path":"session-information.html","id":"session-information","chapter":"C Session Information","heading":"C Session Information","text":"Reproducibility relies ability precisely recreate working environment, session information serves vital reference achieve consistency. record details R environment, package versions, system settings computing environment time analysis.detailed overview tools versions suitable guide, encourage explore session information saved accompanying text file named resources/session_info.txt,","code":"\nlibrary(knitr)\nlibrary(sessioninfo)\n\n# Get session info\ninfo <- capture.output(print(session_info()))\n\n# Create the 'resources' folder if it doesn't exist\nif (!dir.exists(\"resources\")) {\n dir.create(\"resources\")\n}\n\n# Save the session information to a text file without line numbers\ncat(info, file = \"resources/session_info.txt\", sep = \"\\n\")"},{"path":"references.html","id":"references","chapter":"References","heading":"References","text":"","code":""}] | ||
[{"path":"index.html","id":"download-sra","chapter":"IMAP-Part 03: Downloading Microbiome Fastq Sequences from NCBI Database","heading":"IMAP-Part 03: Downloading Microbiome Fastq Sequences from NCBI Database","text":"Welcome Part 3 IMAP Practical User Guides!Explore intricacies microbiome bioinformatics present hands-demonstration “Downloading Microbiome Fastq Sequences SRA Database.” chapter designed guide seasoned researchers newcomers critical process accessing acquiring raw sequencing data Sequence Read Archive (SRA). Join us practical journey unlock potential high-throughput sequencing data, preparing robust microbiome analyses using IMAP platform.","code":""},{"path":"sra-fastq-sequences.html","id":"sra-fastq-sequences","chapter":"1 Retrieving Microbiome Fastq Sequences from the SRA Database","heading":"1 Retrieving Microbiome Fastq Sequences from the SRA Database","text":"Welcome IMAP Practical User Guides, Part 03. segment, offer hands-demonstration focused proficient acquisition “Microbiome Fastq Sequences Sequence Read Archive (SRA) Database.” delve chapter, presumed harbor sincere interest mastering intricacies downloading fastq sequences SRA database—pivotal centralized repository housing short reads Next-Generation Sequencing (NGS) platforms.Acknowledging microbiome read sequencing data can procured diverse sources, including direct acquisition sequencing platforms researchers, downloads Sequence Read Archive (SRA) European Nucleotide Archive (ENA), synthesis sequencing simulators, emphasis tutorial standardized approach provided SRA. approach ensures consistent reporting associated metadata, expounded upon IMAP Part 02 section.Throughout demonstrative session, objective systematically guide process retrieving microbiome sequencing data SRA, promoting uniform methodology. adhering standardized approach, aim enhance consistency subsequent bioinformatics analyses.","code":""},{"path":"installing-sra-toolkit.html","id":"installing-sra-toolkit","chapter":"2 Installing SRA Toolkit","heading":"2 Installing SRA Toolkit","text":"Navigate want install tools, preferably home directory.information click .Demo MAC OS","code":"curl -LO https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.0.0/sratoolkit.3.0.0-mac64.tar.gz\ntar -xf sratoolkit.3.0.0-mac64.tar.gz\nexport PATH=$HOME/sratoolkit.3.0.0-mac64/bin/:$PATH"},{"path":"installing-sra-toolkit.html","id":"create-a-cache-root-directory","chapter":"2 Installing SRA Toolkit","heading":"2.1 Create a cache root directory","text":"","code":"mkdir -p ~/ncbi\necho '/repository/user/main/public/root = \"cache_directory\"' > ~/ncbi/user-settings.mkfg"},{"path":"installing-sra-toolkit.html","id":"confirm-sra-toolkit-configuration","chapter":"2 Installing SRA Toolkit","heading":"2.2 Confirm sra toolkit configuration","text":"vdb-config -command display blue colored dialog.Use tab click c navigate cache tab.Review configuration save s exit x.information click .","code":"vdb-config -i"},{"path":"installing-sra-toolkit.html","id":"alternative-method","chapter":"2 Installing SRA Toolkit","heading":"2.3 Alternative method","text":"can create environment install essential tools . Example, sradb using environment.yml.","code":"name: sradb\nchannels:\n - conda-forge\n - bioconda\ndependencies:\n - sra-tools\n - entrez-direct\n - pysradbmamba create -c bioconda -c conda-forge sradb -file environment.yml"},{"path":"downloading-multiple-fastq-files.html","id":"downloading-multiple-fastq-files","chapter":"3 Downloading multiple fastq files","heading":"3 Downloading multiple fastq files","text":"Make sure fasterq-dump path.Type fasterq-dump fasterq-dump --help confirm.Must specify output temporary files.possible specifies range SRA accessions loop.Example code downloading reads NCBI-SRA accessions ranging SRR7450706 SRR7450761.","code":"for (( i = 706; i <= 761; i++ ))\n do\n time fasterq-dump SRR7450$i \\\n --split-3 \\\n --force \\\n --skip-technical \\\n --outdir data/reads \\\n --temp data/temp \\\n --threads 4 \n done\n\n<!--chapter:end:02-download-sequences.Rmd-->\n\n# Resizing Fastq files\n - Sometimes we want to extract a small subset to test the bioinformatics pipeline.\n - You can resize the fastq files using the `seqkit sample` function [@seqkit2022].\n\n\nExample extracting only 1% of the paired-end metagenomics sequencing data.\n\n> This bash script extracts 1% of the reads from only two sample (SRR10245277 to SRR10245280)\n\n```bash\nmkdir -p data\nfor i in {77..80}\n do\n cat SRR102452$i\\_1.fastq \\\n | seqkit sample -p 0.01 \\\n | seqkit shuffle -o data/SRR102452$i\\_1_sub.fastq \\\n | cat SRR102452$i\\_2.fastq \\\n | seqkit sample -p 0.01 \\\n | seqkit shuffle -o data/SRR102452$i\\_2_sub.fastq\n done"},{"path":"imap-github-repos.html","id":"imap-github-repos","chapter":"A IMAP GitHub Repos","heading":"A IMAP GitHub Repos","text":"","code":""},{"path":"snakemake-rule-graph-imap-part-03.html","id":"snakemake-rule-graph-imap-part-03","chapter":"B Snakemake Rule Graph: IMAP-PART 03","heading":"B Snakemake Rule Graph: IMAP-PART 03","text":"","code":""},{"path":"session-information.html","id":"session-information","chapter":"C Session Information","heading":"C Session Information","text":"Reproducibility relies ability precisely recreate working environment, session information serves vital reference achieve consistency. record details R environment, package versions, system settings computing environment time analysis.detailed overview tools versions suitable guide, encourage explore session information saved accompanying text file named session_info.txt,","code":"\nlibrary(knitr)\nlibrary(sessioninfo)\n\n# Get session info\ninfo <- capture.output(print(session_info()))\n\n# Create the 'resources' folder if it doesn't exist\nif (!dir.exists(\"resources\")) {\n dir.create(\"resources\")\n}\n\n# Save the session information to a text file without line numbers\ncat(info, file = \"session_info.txt\", sep = \"\\n\")"},{"path":"references.html","id":"references","chapter":"References","heading":"References","text":"","code":""}] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters