Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert amplicon16S_analysis workflow (QIIME) from single-sample to multi-sample #452

Merged
merged 31 commits into from
Feb 16, 2023
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c7b99ce
optimizing array for deblur step
golu099 Jan 12, 2023
4291a1a
Attempting array deblur
golu099 Jan 12, 2023
ee9d53a
Merge remote-tracking branch 'origin/master' into fnegrete_test
golu099 Jan 19, 2023
30c5bf9
Array of BAM file
golu099 Jan 19, 2023
fd8310d
removing sampleNme
golu099 Jan 19, 2023
0ac435d
chanigng array in wfl
golu099 Jan 19, 2023
ea7f0f6
worked on for loop
golu099 Jan 19, 2023
045b2a4
remove patterns in parameter_meta
golu099 Jan 19, 2023
4a48412
fix for loops
golu099 Jan 19, 2023
d9c4ffd
samtools issue
golu099 Jan 20, 2023
b2a6f85
samtools fix
golu099 Jan 25, 2023
f05915e
samtools fix
golu099 Jan 25, 2023
ad1ba90
samtools fix
golu099 Jan 26, 2023
b601cb9
samtools fix
golu099 Jan 26, 2023
d66f814
conda env
golu099 Jan 26, 2023
b9e930c
conda env
golu099 Jan 26, 2023
f5dc402
conda removed
golu099 Jan 26, 2023
fff6982
fix
golu099 Jan 26, 2023
a0415fa
list fix
golu099 Jan 26, 2023
0b6a8a6
trim seq fix
golu099 Jan 27, 2023
c14f753
commiting fixes
golu099 Feb 3, 2023
1cac96a
removing sample name
golu099 Feb 3, 2023
46c2af6
fixing input request
golu099 Feb 3, 2023
2be4ff8
Updating array file on import
golu099 Feb 7, 2023
083757d
optimizing disk size
golu099 Feb 9, 2023
cda833d
JSON updates
golu099 Feb 9, 2023
97c8ef0
JSON update
golu099 Feb 9, 2023
66f7470
optimzing run time
golu099 Feb 10, 2023
62fa8e8
addressing Danny's suggestions
golu099 Feb 14, 2023
3dff808
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Feb 14, 2023
73a5d77
Update pipes/WDL/tasks/tasks_16S_amplicon.wdl
golu099 Feb 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 57 additions & 56 deletions pipes/WDL/tasks/tasks_16S_amplicon.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,47 +5,53 @@ task qiime_import_from_bam {
description: "Parsing demultiplexed fastq BAM files into qiime readable files."
}
input {
File reads_bam
String sample_name
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(reads_bam, "GiB")) + 5
Array[File] reads_bam
Int memory_mb = 7000
Int cpu = 5
Int disk_size_gb = ceil(2*20) + 5
String docker = "quay.io/broadinstitute/qiime2:conda"
golu099 marked this conversation as resolved.
Show resolved Hide resolved
}
parameter_meta {
reads_bam: {description: "Input BAM file"}
reads_bam: {description: "Input BAM files"}
golu099 marked this conversation as resolved.
Show resolved Hide resolved
}

command <<<
set -ex -o pipefail

#Part 1A | BAM -> FASTQ [Simple samtools command]
samtools fastq -1 $(pwd)/R1.fastq.gz -2 $(pwd)/R2.fastq.gz -0 /dev/null ~{reads_bam}
#making new bash variable | regex: (_) -> (-)
NEWSAMPLENAME=$(echo "~{sample_name}" | perl -lape 's/[_]/-/g')
#All names added to one giant file
echo ${NEWSAMPLENAME} > NEWSAMPLENAME.txt
#Make a manifest.txt that contains [1.sampleid 2.R1_fastq 3.R2.fastq]
#> =overwrite or writes new file
manifest_TSV=manifest.tsv
echo -e "sample-id\tforward-absolute-filepath\treverse-absolute-filepath" > manifest.tsv
#>>= appends
#\t= tabs next value
echo -e "$NEWSAMPLENAME\t$(pwd)/R1.fastq.gz\t$(pwd)/R2.fastq.gz" >> manifest.tsv

for bam in ~{sep=' ' reads_bam}; do
#making new bash variable | regex: (_) -> (-)
NEWSAMPLENAME=$(basename $bam .bam | perl -lape 's/[_]/-/g')
echo $NEWSAMPLENAME
samtools fastq -1 $NEWSAMPLENAME.R1.fastq.gz -2 $NEWSAMPLENAME.R2.fastq.gz -0 /dev/null $bam
#All names added to one giant file
#up to here works...not reading the manifest tsv for some reason
golu099 marked this conversation as resolved.
Show resolved Hide resolved
echo $NEWSAMPLENAME >> NEWSAMPLENAME.txt
#>=replaces
#>>= appends
#\t= tabs next value
echo -e "$NEWSAMPLENAME\t$(pwd)/$NEWSAMPLENAME.R1.fastq.gz\t$(pwd)/$NEWSAMPLENAME.R2.fastq.gz"
echo -e "$NEWSAMPLENAME\t$(pwd)/$NEWSAMPLENAME.R1.fastq.gz\t$(pwd)/$NEWSAMPLENAME.R2.fastq.gz" >> manifest.tsv
done
# debug
cat manifest.tsv
#fastq -> bam (provided by qiime tools import fxn)
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest.tsv \
--input-format PairedEndFastqManifestPhred33V2 \
--output-path "~{sample_name}.qza"
--output-path "batch.qza"
>>>

output {
File reads_qza = "~{sample_name}.qza"
File reads_qza = "batch.qza"
String samplename_master_sheet = read_string("NEWSAMPLENAME.txt")
}
runtime {
docker: docker
memory: "${memory_mb} MiB"
memory: "~{memory_mb} MiB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
Expand All @@ -63,15 +69,13 @@ task trim_reads {

input {
File reads_qza

String qza_basename = basename(reads_qza, '.qza')
#Boolean not_default = false
String forward_adapter = "CTGCTGCCTCCCGTAGGAGT"
String reverse_adapter = "AGAGTTTGATCCTGGCTCAG"
Int min_length = 1
Boolean keep_untrimmed_reads = false
Int memory_mb = 2000
Int cpu = 1
Int cpu = 3
golu099 marked this conversation as resolved.
Show resolved Hide resolved
Int disk_size_gb = ceil(2*size(reads_qza, "GiB")) + 5
golu099 marked this conversation as resolved.
Show resolved Hide resolved
String docker = "quay.io/broadinstitute/qiime2:conda"
}
Expand All @@ -84,18 +88,18 @@ task trim_reads {
--p-front-r "~{reverse_adapter}" \
~{"--p-minimum-length " + min_length} \
~{true='--p-no-discard-untrimmed' false='--p-discard-untrimmed' keep_untrimmed_reads} \
--o-trimmed-sequences "~{qza_basename}_trimmed.qza"
--o-trimmed-sequences "trimmed.qza"

#trim_visual
qiime demux summarize \
--i-data "~{qza_basename}_trimmed.qza" \
--o-visualization "~{qza_basename}_trim_summary.qzv"
--i-data "trimmed.qza" \
--o-visualization "trim_summary.qzv"
>>>

output {
#trimmed_sequences = paired ends for vsearch
File trimmed_reads_qza = "~{qza_basename}_trimmed.qza"
File trimmed_visualization = "~{qza_basename}_trim_summary.qzv"
File trimmed_reads_qza = "trimmed.qza"
File trimmed_visualization = "trim_summary.qzv"
}

runtime {
Expand All @@ -115,7 +119,6 @@ task join_paired_ends {
input {
#Input File: Merge paired reads
File trimmed_reads_qza
String reads_basename = basename(trimmed_reads_qza, '.qza')
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(trimmed_reads_qza, "GiB")) + 5
golu099 marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -126,15 +129,15 @@ task join_paired_ends {
set -ex -o pipefail
qiime vsearch join-pairs \
--i-demultiplexed-seqs ~{trimmed_reads_qza} \
--o-joined-sequences "~{reads_basename}_joined.qza"
--o-joined-sequences "joined.qza"

qiime demux summarize \
--i-data "~{reads_basename}_joined.qza" \
--o-visualization "~{reads_basename}_visualization.qzv"
--i-data "joined.qza" \
--o-visualization "visualization.qzv"
>>>
output {
File joined_end_reads_qza = "~{reads_basename}_joined.qza"
File joined_end_visualization = "~{reads_basename}_visualization.qzv"
File joined_end_reads_qza = "joined.qza"
File joined_end_visualization = "visualization.qzv"
}
runtime {
docker: docker
Expand All @@ -152,7 +155,6 @@ task deblur {
}
input {
File joined_end_reads_qza
String joined_end_basename = basename(joined_end_reads_qza, '.qza')
Int trim_length_var = 300
Int memory_mb = 2000
Int cpu = 1
Expand All @@ -163,27 +165,27 @@ task deblur {
set -ex -o pipefail

qiime deblur denoise-16S \
--i-demultiplexed-seqs ~{joined_end_reads_qza} \
--i-demultiplexed-seqs ~{joined_end_reads_qza}\
~{"--p-trim-length " + trim_length_var} \
--p-sample-stats \
--o-representative-sequences "~{joined_end_basename}_rep_seqs.qza" \
--o-table "~{joined_end_basename}_table.qza" \
--o-stats "~{joined_end_basename}_stats.qza"
--o-representative-sequences "rep_seqs.qza" \
--o-table "table.qza" \
--o-stats "stats.qza"

#Generate feature table- give you the number of features per sample
qiime feature-table summarize \
--i-table "~{joined_end_basename}_table.qza" \
--o-visualization "~{joined_end_basename}_table.qzv"
--i-table "table.qza" \
--o-visualization "table.qzv"
#Generate visualization of deblur stats
qiime deblur visualize-stats \
--i-deblur-stats "~{joined_end_basename}_stats.qza" \
--o-visualization "~{joined_end_basename}_stats.qzv"
--i-deblur-stats "stats.qza" \
--o-visualization "stats.qzv"
>>>
output {
File representative_seqs_qza = "~{joined_end_basename}_rep_seqs.qza"
File representative_table_qza = "~{joined_end_basename}_table.qza"
File feature_table = "~{joined_end_basename}_table.qzv"
File visualize_stats = "~{joined_end_basename}_stats.qzv"
File representative_seqs_qza = "rep_seqs.qza"
File representative_table_qza = "table.qza"
File feature_table = "table.qzv"
File visualize_stats = "stats.qzv"

}
runtime {
Expand Down Expand Up @@ -259,7 +261,6 @@ task tax_analysis {
File trained_classifier
File representative_seqs_qza
File representative_table_qza
String basename = basename(trained_classifier, '.qza')
Int memory_mb = 5
Int cpu = 1
Int disk_size_gb = 375
Expand All @@ -270,26 +271,26 @@ task tax_analysis {
qiime feature-classifier classify-sklearn \
--i-classifier ~{trained_classifier} \
--i-reads ~{representative_seqs_qza} \
--o-classification "~{basename}_tax.qza"
--o-classification "taxonomy.qza"

qiime feature-table tabulate-seqs \
--i-data ~{representative_seqs_qza} \
--o-visualization "~{basename}_rep_seqs.qzv"
--o-visualization "list_rep_seqs.qzv"

qiime taxa barplot \
--i-table ~{representative_table_qza} \
--i-taxonomy "~{basename}_tax.qza" \
--o-visualization "~{basename}_bar_plots.qzv"
--i-taxonomy "taxonomy.qza" \
--o-visualization "taxa_bar_plots.qzv"
>>>
output {
File rep_seq_list = "~{basename}_rep_seqs.qzv"
File tax_classification_graph = "~{basename}_bar_plots.qzv"
File rep_seq_list = "list_rep_seqs.qzv"
File tax_classification_graph = "taxa_bar_plots.qzv"
}
runtime {
docker: docker
memory: "7 GB"
memory: "10 GB"
cpu: cpu
disk: disk_size_gb + " GB"
disks: "local-disk " + disk_size_gb + " HDD"
}
}
}
8 changes: 3 additions & 5 deletions pipes/WDL/workflows/amplicon16S_analysis.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,14 @@ workflow amplicon16S_analysis {
allowNestedInputs: true
}
input {
File reads_bam
Array[File] reads_bam
File trained_classifier
String sample_name
Boolean keep_untrimmed_reads
Boolean keep_untrimmed_reads
}

call qiime.qiime_import_from_bam {
input:
reads_bam = reads_bam,
sample_name = sample_name
reads_bam = reads_bam
}
#__________________________________________
call qiime.trim_reads {
Expand Down
6 changes: 2 additions & 4 deletions pipes/WDL/workflows/qiime_import_bam.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,11 @@ meta{
allowNestedInputs: true
}
input {
File reads_bam
String sample_name
Array[File] reads_bam
}

call infile.qiime_import_from_bam {
input:
reads_bam = reads_bam,
sample_name = sample_name
reads_bam = reads_bam
}
}
3 changes: 1 addition & 2 deletions test/input/WDL/test_inputs-qiime_import_bam-local.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
{
"qiime_import_bam.reads_bam": "test/input/G5012.3.subset.bam",
"qiime_import_bam.sample_name": "G5012.3.subset.bam"
"qiime_import_bam.reads_bam": ["test/input/G5012.3.subset.bam"]
}