Skip to content

Commit

Permalink
Merge pull request #1109 from FriederikeHanssen/mpileup
Browse files Browse the repository at this point in the history
fix bcftools mpileup
  • Loading branch information
FriederikeHanssen authored Jun 15, 2023
2 parents 62aa7ee + 2e2397d commit 8e5bb2d
Show file tree
Hide file tree
Showing 8 changed files with 73 additions and 52 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Vuoinesluobbalah is a lake close to Bierikjávrre.
- [#1101](https://github.com/nf-core/sarek/pull/1101) - Fix GATK4 version for GATK4 MarkduplicatesSpark [#1068](https://github.com/nf-core/sarek/issues/1068)
- [#1105](https://github.com/nf-core/sarek/pull/1105) - Remove `params.tracedir`
- [#1108](https://github.com/nf-core/sarek/pull/1108) - Refactor bad prefix definition for vcf files [#938](https://github.com/nf-core/sarek/issues/938)
- [#1109](https://github.com/nf-core/sarek/pull/1109) - Fix `mpileup` for variantcalling: only `bcftools` run and file publishing

## [3.2.1](https://github.com/nf-core/sarek/releases/tag/3.2.1) - Pierikjaure

Expand Down
25 changes: 14 additions & 11 deletions conf/modules/mpileup.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,7 @@ process {

withName: 'CAT_MPILEUP' {
publishDir = [
enabled: true,
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/mpileup/${meta.id}/" },
pattern: "*{mpileup.gz}"
enabled: false
]
}

Expand All @@ -31,25 +28,31 @@ process {
ext.args3 = "-i 'count(GT==\"RR\")==0'" // only report non homozygous reference variants
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/mpileup/${meta.id}/" },
path: { "${params.outdir}/variant_calling/bcftools/${meta.id}/" },
pattern: "*{vcf.gz,vcf.gz.tbi}",
saveAs: { meta.num_intervals > 1 ? null : it }
]
}

withName: 'MERGE_BCFTOOLS_MPILEUP' {
ext.prefix = {"${meta.id}.bcftools"}
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/bcftools/${meta.id}/" },
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}

withName: 'SAMTOOLS_MPILEUP' {
ext.when = { params.tools && (params.tools.split(',').contains('controlfreec') || params.tools.split(',').contains('mpileup')) }
ext.when = { params.tools && params.tools.split(',').contains('controlfreec') }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/mpileup/${meta.id}/" },
pattern: "*mpileup.gz",
saveAs: { meta.num_intervals > 1 ? null : it }
enabled: false
]

}

// PAIR_VARIANT_CALLING
if (params.tools && (params.tools.split(',').contains('controlfreec') || params.tools.split(',').contains('mpileup'))) {
if (params.tools && params.tools.split(',').contains('controlfreec')) {
withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_MPILEUP:SAMTOOLS_MPILEUP' {
ext.prefix = { meta.num_intervals <= 1 ? "${meta.id}.normal" : "${meta.id}_${intervals.simpleName}.normal" }
}
Expand Down
32 changes: 16 additions & 16 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [FreeBayes](#freebayes)
- [GATK HaplotypeCaller](#gatk-haplotypecaller)
- [GATK Mutect2](#gatk-mutect2)
- [samtools mpileup](#samtools-mpileup)
- [bcftools](#bcftools)
- [Strelka2](#strelka2)
- [Structural Variants](#structural-variants)
- [Manta](#manta)
Expand Down Expand Up @@ -279,6 +279,21 @@ If some results from a variant caller do not appear here, please check out the `

For single nucleotide variants (SNVs) and small indels, multiple tools are available for normal (germline), tumor-only, and tumor-normal (somatic) paired data. For a list of the appropriate tool(s) for the data and sequencing type at hand, please check [here](usage.md#which-tool).

#### bcftools

[bcftools mpileup](https://samtools.github.io/bcftools/bcftools.html#mpileup) generates pileup of a CRAM file, followed by [bcftools call](https://samtools.github.io/bcftools/bcftools.html#call) and filtered with `-i 'count(GT==\"RR\")==0`.
For further reading and documentation see the [bcftools manual](https://samtools.github.io/bcftools/howtos/variant-calling.html).

<details markdown="1">
<summary>Output files for all samples</summary>

**Output directory: `{outdir}/variantcalling/bcftools/<sample>/`**

- `<sample>.bcftools.vcf.gz` and `<sample>.bcftools.vcf.gz.tbi`
- VCF with tabix index

</details>

#### DeepVariant

[DeepVariant](https://github.com/google/deepvariant) is a deep learning-based variant caller that takes aligned reads, produces pileup image tensors from them, classifies each tensor using a convolutional neural network and finally reports the results in a standard VCF or gVCF file. For further documentation take a look [here](https://github.com/google/deepvariant/tree/r1.4/docs).
Expand Down Expand Up @@ -384,21 +399,6 @@ Files created:

</details>

#### samtools mpileup

[samtools mpileup](https://www.htslib.org/doc/samtools-mpileup.html) generates pileup of a CRAM file.
For further reading and documentation see the [samtools manual](https://www.htslib.org/doc/samtools-mpileup.html).

<details markdown="1">
<summary>Output files for all samples</summary>

**Output directory: `{outdir}/variantcalling/mpileup/<sample>/`**

- `<sample>.pileup.gz`
- The pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. Alignment records are grouped by sample (`SM`) identifiers in `@RG` header lines.

</details>

#### Strelka2

[Strelka2](https://github.com/Illumina/strelka) is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. For further reading and documentation see the [Strelka2 user guide](https://github.com/Illumina/strelka/blob/master/docs/userGuide/README.md). If [Strelka2](https://github.com/Illumina/strelka) is used for somatic variant calling and [Manta](https://github.com/Illumina/manta) is also specified in tools, the output candidate indels from [Manta](https://github.com/Illumina/manta) are used according to [Strelka Best Practices](https://github.com/Illumina/strelka/blob/master/docs/userGuide/README.md#somatic-configuration-example).
Expand Down
20 changes: 9 additions & 11 deletions subworkflows/local/bam_variant_calling_mpileup/main.nf
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
include { CAT_CAT as CAT_MPILEUP } from '../../../modules/nf-core/cat/cat/main'
include { BCFTOOLS_MPILEUP } from '../../../modules/nf-core/bcftools/mpileup/main'
include { SAMTOOLS_MPILEUP } from '../../../modules/nf-core/samtools/mpileup/main'
include { GATK4_MERGEVCFS } from '../../../modules/nf-core/gatk4/mergevcfs/main'


include { CAT_CAT as CAT_MPILEUP } from '../../../modules/nf-core/cat/cat/main'
include { BCFTOOLS_MPILEUP } from '../../../modules/nf-core/bcftools/mpileup/main'
include { SAMTOOLS_MPILEUP } from '../../../modules/nf-core/samtools/mpileup/main'
include { GATK4_MERGEVCFS as MERGE_BCFTOOLS_MPILEUP } from '../../../modules/nf-core/gatk4/mergevcfs/main'
workflow BAM_VARIANT_CALLING_MPILEUP {
take:
cram // channel: [mandatory] [ meta, cram, crai ]
Expand Down Expand Up @@ -44,20 +42,20 @@ workflow BAM_VARIANT_CALLING_MPILEUP {

// Merge VCF
vcf_to_merge = vcf_mpileup.intervals.map{ meta, vcf -> [ groupKey(meta, meta.num_intervals), vcf ] }.groupTuple()
GATK4_MERGEVCFS(vcf_to_merge, dict)
MERGE_BCFTOOLS_MPILEUP(vcf_to_merge, dict)

// Mix intervals and no_intervals channels together
mpileup = CAT_MPILEUP.out.file_out.mix(mpileup_samtools.no_intervals)
// add variantcaller to meta map and remove no longer necessary field: num_intervals
.map{ meta, mpileup -> [ meta - meta.subMap('num_intervals') + [ variantcaller:'mpileup' ], mpileup ] }
vcf = GATK4_MERGEVCFS.out.vcf.mix(vcf_mpileup.no_intervals)
.map{ meta, mpileup -> [ meta - meta.subMap('num_intervals') + [ variantcaller:'samtools' ], mpileup ] }
vcf = MERGE_BCFTOOLS_MPILEUP.out.vcf.mix(vcf_mpileup.no_intervals)
// add variantcaller to meta map and remove no longer necessary field: num_intervals
.map{ meta, vcf -> [ meta - meta.subMap('num_intervals') + [ variantcaller:'mpileup' ], vcf ] }
.map{ meta, vcf -> [ meta - meta.subMap('num_intervals') + [ variantcaller:'bcftools' ], vcf ] }

versions = versions.mix(SAMTOOLS_MPILEUP.out.versions)
versions = versions.mix(BCFTOOLS_MPILEUP.out.versions)
versions = versions.mix(CAT_MPILEUP.out.versions)
versions = versions.mix(GATK4_MERGEVCFS.out.versions)
versions = versions.mix(MERGE_BCFTOOLS_MPILEUP.out.versions)

emit:
mpileup
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ workflow BAM_VARIANT_CALLING_TUMOR_ONLY_ALL {
//TODO: Temporary until the if's can be removed and printing to terminal is prevented with "when" in the modules.config
vcf_freebayes = Channel.empty()
vcf_manta = Channel.empty()
vcf_mpileup = Channel.empty()
vcf_mutect2 = Channel.empty()
vcf_strelka = Channel.empty()
vcf_tiddit = Channel.empty()
Expand All @@ -53,7 +54,7 @@ workflow BAM_VARIANT_CALLING_TUMOR_ONLY_ALL {
fasta,
intervals
)

vcf_mpileup = BAM_VARIANT_CALLING_MPILEUP.out.vcf
versions = versions.mix(BAM_VARIANT_CALLING_MPILEUP.out.versions)
}

Expand Down Expand Up @@ -172,6 +173,7 @@ workflow BAM_VARIANT_CALLING_TUMOR_ONLY_ALL {
vcf_freebayes,
vcf_manta,
vcf_mutect2,
vcf_mpileup,
vcf_strelka,
vcf_tiddit
)
Expand All @@ -180,6 +182,7 @@ workflow BAM_VARIANT_CALLING_TUMOR_ONLY_ALL {
vcf_all
vcf_freebayes
vcf_manta
vcf_mpileup
vcf_mutect2
vcf_strelka
vcf_tiddit
Expand Down
10 changes: 5 additions & 5 deletions tests/test_controlfreec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@
- path: results/variant_calling/controlfreec/sample4_vs_sample3/sample4_vs_sample3_ratio.png
# binary changes md5sums on reruns
- path: results/variant_calling/mpileup/sample4_vs_sample3/sample4_vs_sample3.normal.mpileup.gz
# binary changes md5sums on reruns
should_exist: false
- path: results/variant_calling/mpileup/sample4_vs_sample3/sample4_vs_sample3.tumor.mpileup.gz
# binary changes md5sums on reruns
should_exist: false
- path: results/cnvkit
should_exist: false
- name: Run variant calling on somatic samples with controlfreec without intervals
Expand Down Expand Up @@ -98,9 +98,9 @@
- path: results/variant_calling/controlfreec/sample4_vs_sample3/sample4_vs_sample3_sample.cpn
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: results/variant_calling/mpileup/sample4_vs_sample3/sample4_vs_sample3.normal.mpileup.gz
# binary changes md5sums on reruns
should_exist: false
- path: results/variant_calling/mpileup/sample4_vs_sample3/sample4_vs_sample3.tumor.mpileup.gz
# binary changes md5sums on reruns
should_exist: false
- path: results/controlfreec
should_exist: false
- path: results/mpileup
Expand Down Expand Up @@ -143,7 +143,7 @@
- path: results/variant_calling/controlfreec/sample2/sample2_sample.cpn
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: results/variant_calling/mpileup/sample2/sample2.tumor.mpileup.gz
# binary changes md5sums on reruns
should_exist: false
- path: results/controlfreec
should_exist: false
- path: results/mpileup
Expand Down
30 changes: 23 additions & 7 deletions tests/test_mpileup.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@
- mpileup
files:
- path: results/multiqc
- path: results/variant_calling/mpileup/sample2/sample2.tumor.mpileup.gz
- path: results/variant_calling/bcftools/sample2/sample2.bcftools.vcf.gz
# binary changes md5sums on reruns
- path: results/variant_calling/bcftools/sample2/sample2.bcftools.vcf.gz.tbi
# binary changes md5sums on reruns
- path: results/variant_calling/mpileup/sample2/sample2.tumor.mpileup.gz
should_exist: false
- path: results/mpileup
should_exist: false
- name: Run variant calling on tumor_only sample to test mpileup without intervals
Expand All @@ -23,8 +27,12 @@
md5sum: f3dac01ea66b95fe477446fde2d31489
- path: results/no_intervals.bed.gz.tbi
md5sum: f3dac01ea66b95fe477446fde2d31489
- path: results/variant_calling/mpileup/sample2/sample2.tumor.mpileup.gz
# binary changes md5sums on reruns
- path: results/variant_calling/bcftools/sample2/sample2.bcftools.vcf.gz
# binary changes md5sums on reruns
- path: results/variant_calling/bcftools/sample2/sample2.bcftools.vcf.gz.tbi
# binary changes md5sums on reruns
- path: results/variant_calling/mpileup/
should_exist: false
- path: results/mpileup
should_exist: false
- name: Run variant calling on germline sample to test mpileup
Expand All @@ -34,8 +42,12 @@
- mpileup
files:
- path: results/multiqc
- path: results/variant_calling/mpileup/sample1/sample1.normal.mpileup.gz
# binary changes md5sums on reruns
- path: results/variant_calling/bcftools/sample1/sample1.bcftools.vcf.gz
# binary changes md5sums on reruns
- path: results/variant_calling/bcftools/sample1/sample1.bcftools.vcf.gz.tbi
# binary changes md5sums on reruns
- path: results/variant_calling/mpileup/
should_exist: false
- path: results/mpileup
should_exist: false
- name: Run variant calling on germline sample to test mpileup without intervals
Expand All @@ -52,7 +64,11 @@
md5sum: f3dac01ea66b95fe477446fde2d31489
- path: results/no_intervals.bed.gz.tbi
md5sum: f3dac01ea66b95fe477446fde2d31489
- path: results/variant_calling/mpileup/sample1/sample1.normal.mpileup.gz
# binary changes md5sums on reruns
- path: results/variant_calling/bcftools/sample1/sample1.bcftools.vcf.gz
# binary changes md5sums on reruns
- path: results/variant_calling/bcftools/sample1/sample1.bcftools.vcf.gz.tbi
# binary changes md5sums on reruns
- path: results/variant_calling/mpileup/
should_exist: false
- path: results/mpileup
should_exist: false
2 changes: 1 addition & 1 deletion workflows/sarek.nf
Original file line number Diff line number Diff line change
Expand Up @@ -1011,7 +1011,7 @@ workflow SAREK {
vcf_to_annotate = vcf_to_annotate.mix(BAM_VARIANT_CALLING_GERMLINE_ALL.out.vcf_manta)
vcf_to_annotate = vcf_to_annotate.mix(BAM_VARIANT_CALLING_GERMLINE_ALL.out.vcf_strelka)
vcf_to_annotate = vcf_to_annotate.mix(BAM_VARIANT_CALLING_GERMLINE_ALL.out.vcf_tiddit)
// vcf_to_annotate = vcf_to_annotate.mix(BAM_VARIANT_CALLING_GERMLINE_ALL.out.vcf_mpileup) // Not annotated?
vcf_to_annotate = vcf_to_annotate.mix(BAM_VARIANT_CALLING_GERMLINE_ALL.out.vcf_mpileup)
vcf_to_annotate = vcf_to_annotate.mix(BAM_VARIANT_CALLING_TUMOR_ONLY_ALL.out.vcf_all)
vcf_to_annotate = vcf_to_annotate.mix(BAM_VARIANT_CALLING_SOMATIC_ALL.out.vcf_all)

Expand Down

0 comments on commit 8e5bb2d

Please sign in to comment.