-
Notifications
You must be signed in to change notification settings - Fork 0
Day1 Task2
Source: https://github.com/GeoGenetics/data-analysis-2024/tree/main/reproducible-data-analysis/day1/task2
Data: https://github.com/GeoGenetics/data-analysis-2024/tree/main/reproducible-data-analysis/data/mapping
Environment: day1
This is a Bash script that takes command-line arguments, runs Bowtie2 to map reads, sorts the resulting BAM file with Samtools, and cleans up intermediate files.
Click to get the code used to generate the image!
Paste this code into the Mermaid Editor graph TD
A[Input files] --> B{Command line arguments}
B -->|index| C(index)
B -->|reads| D(reads)
B -->|threads| E(threads)
C --> F[Reference index]
D --> G[Input reads]
E -->|Default: 2| H[Number of threads]
F -->|Required| I{Map reads with Bowtie2}
G --> I
I --> J{Sort with Samtools}
J --> K{Clean up intermediate files}
K --> L[Output BAM file]
Let's go through the script step by step to understand what it does and how it works.
-
Input validation and argument parsing. This step checks the user input to make sure all required arguments are present, and that the specified files and options are valid. It also sets default values for any optional arguments that were not specified.
#!/bin/bash set -euo pipefail # Enable strict mode trap 'echo "Error: ${BASH_SOURCE}:${LINENO}: ${BASH_COMMAND}" >&2' ERR trap 'echo "Interrupted" >&2 ; exit 1' INT # Define usage message usage() { echo "Usage: $(basename "$0") -i <index> -r <reads> [-t <threads>]" } # Parse command-line arguments while getopts ":i:r:t:" opt; do case ${opt} in i) index="${OPTARG}" ;; r) reads="${OPTARG}" ;; t) threads="${OPTARG}" ;; \?) echo "Invalid option: -${OPTARG}" >&2 usage exit 1 ;; :) echo "Option -${OPTARG} requires an argument" >&2 usage exit 1 ;; esac done # Check that required options were provided if [[ -z "${index:-}" || -z "${reads:-}" ]]; then usage exit 1 fi # Set default values threads="${threads:-2}"
-
Read mapping with Bowtie2 and Converting SAM to BAM and filtering mapped reads. This step uses Bowtie2, a short-read aligner, to map reads from the input FASTQ file(s) to a reference genome. Then, pipes the results from Bowtie2 into SAMtools to convert the SAM file output from the previous step to a binary BAM format. It then filters out any reads that were not successfully mapped to the reference genome, resulting in a BAM file containing only the mapped reads.
# Construct output file paths file_name=$(basename ${reads}) bam_mapped_file="${file_name%%[ .]*}".mapped.bam bam_file="${file_name%%[ .]*}".sorted.bam # Run Bowtie2 to map the reads bowtie2 --threads "${threads}" -x "${index}" -U "${reads}" \ | samtools view -@ "${threads}" -bS -F 4 > "${bam_mapped_file}"
-
Sorting the BAM file. This step uses SAMtools to sort the mapped BAM file by genomic coordinates, which is necessary for downstream analysis. Are you comfortable with the concept of sorting a BAM file? Do you have any questions about the sorting process?
# Sort the resulting BAM file with Samtools samtools sort -@ "${threads}" -o "${bam_file}" "${bam_mapped_file}"
-
Cleaning up intermediate files. This step removes any intermediate files generated during the previous steps, leaving only the final sorted BAM file.
# Clean up intermediate files rm -rf "${bam_mapped_file}"
We can call this script as follows:
cd ~/course/wdir/data-analysis-2024/reproducible-data-analysis/day1/task2
./mapping-and-sorting.sh -r ../../data/mapping/sample1.fq.gz -i ../../data/mapping/SRS121011 -t 2
- Day 0
- Day 1
- Setting up the environment
- Task 1: Writing a simple BASH script
- Task 2: Writing a More Complex BASH script
- Day 2
- Resources