Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASCAT #1332

Merged
merged 68 commits into from
Mar 15, 2022
Merged

ASCAT #1332

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
2ac8f1f
First commit
Dec 15, 2021
8aa7277
putting correct links for singularity and docker containers (just had…
lassefolkersen Dec 17, 2021
4c2627a
adding first try of relevant commands (not working yet, just took the…
lassefolkersen Dec 17, 2021
667eed0
test commit
Feb 8, 2022
dfa8aed
remove test
Feb 8, 2022
c20a6b4
starting up work with module after 3.0.0 upgrade
Feb 9, 2022
6402f96
add ascat.prepareHTS statemet
lassefolkersen Feb 9, 2022
e69c9d4
add location of docker for new mulled alleleCounter+ASCAT container
lassefolkersen Feb 9, 2022
3090d24
first full run with ASCAT on HG00154.mapped.ILLUMINA.bwa.GBR.low_cove…
lassefolkersen Feb 21, 2022
341f7d0
add notes on dropbox download
lassefolkersen Feb 21, 2022
a7901fc
use a newer pytest_modules.yml
lassefolkersen Feb 21, 2022
24981fc
add outpit
lassefolkersen Feb 21, 2022
53b04ee
trying to align with current Sarek output
lassefolkersen Feb 21, 2022
50c095b
adding in FH comments
lassefolkersen Feb 22, 2022
e8d3729
busy clearing up arguments and testing. Still WIP
lassefolkersen Feb 22, 2022
9b820c8
first working run, in nextflow, with sarek-like output. Still needs m…
lassefolkersen Feb 22, 2022
cd959f6
cleaning up before writing up findings
lassefolkersen Feb 22, 2022
e332472
testing with putting in arguments in args
lassefolkersen Feb 22, 2022
7a3b6e9
draft for solution 3 style for arguments
lassefolkersen Feb 22, 2022
4c953bb
one more test added
lassefolkersen Feb 22, 2022
3f3b730
adding FH map
lassefolkersen Feb 22, 2022
43bd99c
finished testing maps for args
lassefolkersen Feb 22, 2022
44b1425
wrap-up cram/crai test successfully
lassefolkersen Feb 23, 2022
c941817
updates to address ability to put in ref.fasta argument for cram running
lassefolkersen Feb 24, 2022
67eed88
adding remaining import-HTS commands in as args, and removing the chr…
lassefolkersen Feb 25, 2022
dfad48a
first test with auto-downloading the s3-data (when not given as an ar…
lassefolkersen Mar 4, 2022
2d82d5f
removing download-logic for supporting files, documenting in meta.yml…
lassefolkersen Mar 7, 2022
1837770
adding mulled singularity container
lassefolkersen Mar 7, 2022
06cf3ba
removing tests
lassefolkersen Mar 7, 2022
318e12b
update to main branch
lassefolkersen Mar 7, 2022
60fe3cb
fix left padding lint issue
lassefolkersen Mar 7, 2022
f3bf3c3
lint failure in meta.yml
lassefolkersen Mar 7, 2022
c6c2faf
more linting errors
lassefolkersen Mar 7, 2022
3d14fb9
add when argument
lassefolkersen Mar 7, 2022
9527c0c
adding stub functionality
lassefolkersen Mar 8, 2022
0e53885
add stub run
lassefolkersen Mar 8, 2022
e61f0bd
correct md5sum for versions.yml
lassefolkersen Mar 8, 2022
d7c1cc0
more testing with -runstub
lassefolkersen Mar 8, 2022
ad84f20
stub code in pure bash - not mixed with R
lassefolkersen Mar 8, 2022
b90e81f
reformat version.yml
lassefolkersen Mar 8, 2022
d0edf47
get rid of absolute paths in test.yml
lassefolkersen Mar 8, 2022
e2dbc37
correct wrong md5sum
lassefolkersen Mar 8, 2022
0addc01
Merge latest version of github.com:nf-core/modules
lassefolkersen Mar 8, 2022
6888d22
adding allelecount conda link
lassefolkersen Mar 11, 2022
781ca7c
rename normal_bam to input_bam etc
lassefolkersen Mar 11, 2022
f7637ac
let the pipeline dev worry about matching the right loci and allele f…
lassefolkersen Mar 11, 2022
111be63
dont hardcode default genomebuild
lassefolkersen Mar 11, 2022
0711895
Merge github.com:nf-core/modules
lassefolkersen Mar 11, 2022
1f3aafd
adding download instruction comment
lassefolkersen Mar 11, 2022
1d7eeb1
add doi
lassefolkersen Mar 11, 2022
abaec74
fix conda addition bug
lassefolkersen Mar 11, 2022
d04d7bc
add args documentation
lassefolkersen Mar 11, 2022
e134545
test new indent
lassefolkersen Mar 11, 2022
284ff63
new test with meta.yml indentation
lassefolkersen Mar 11, 2022
7174c78
retry with new meta.yml
lassefolkersen Mar 11, 2022
55395dc
retry with new meta.yml - now with empty lines around
lassefolkersen Mar 11, 2022
5501a82
retry with new meta.yml - remove trailing whitepsace
lassefolkersen Mar 11, 2022
5cb19d8
trying to fix found quote character that cannot start any token error
lassefolkersen Mar 11, 2022
6f10da0
try with one empty line above triple-quote and no empty line below
lassefolkersen Mar 11, 2022
8e5da59
trying with pipe character
lassefolkersen Mar 11, 2022
dfffe72
checking if its the ending triple quote
lassefolkersen Mar 11, 2022
c499270
one more try with meta.yml
lassefolkersen Mar 14, 2022
098cb73
Merge github.com:nf-core/modules
lassefolkersen Mar 14, 2022
185eec1
test update bioconda versioning for linting failure
lassefolkersen Mar 14, 2022
dbfa7b1
test update bioconda versioning for linting failure 2
lassefolkersen Mar 14, 2022
4d4013f
testing allelecounter version error on conda
lassefolkersen Mar 14, 2022
285f7ae
Merge github.com:nf-core/modules
lassefolkersen Mar 14, 2022
e2abab7
Merge github.com:nf-core/modules
lassefolkersen Mar 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions modules/ascat/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
process ASCAT {
tag "$meta.id"
label 'process_medium'

conda (params.enable_conda ? "bioconda::ascat=3.0.0 bioconda::cancerit-allelecount-4.3.0": null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-c278c7398beb73294d78639a864352abef2931ce:dfe5aaa885de434adb2b490b68972c5840c6d761-0':
lassefolkersen marked this conversation as resolved.
Show resolved Hide resolved
'quay.io/biocontainers/mulled-v2-c278c7398beb73294d78639a864352abef2931ce:dfe5aaa885de434adb2b490b68972c5840c6d761-0' }"

input:
tuple val(meta), path(input_normal), path(index_normal), path(input_tumor), path(index_tumor)
path(allele_files)
path(loci_files)

output:
tuple val(meta), path("*png"), emit: png
tuple val(meta), path("*cnvs.txt"), emit: cnvs
tuple val(meta), path("*purityploidy.txt"), emit: purityploidy
tuple val(meta), path("*segments.txt"), emit: segments
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def gender = args.gender ? "$args.gender" : "NULL"
def genomeVersion = args.genomeVersion ? "$args.genomeVersion" : "NULL"
def purity = args.purity ? "$args.purity" : "NULL"
FriederikeHanssen marked this conversation as resolved.
Show resolved Hide resolved
def ploidy = args.ploidy ? "$args.ploidy" : "NULL"
def gc_files = args.gc_files ? "$args.gc_files" : "NULL"

def minCounts_arg = args.minCounts ? ",minCounts = $args.minCounts" : ""
def chrom_names_arg = args.chrom_names ? ",chrom_names = $args.chrom_names" : ""
def min_base_qual_arg = args.min_base_qual ? ",min_base_qual = $args.min_base_qual" : ""
def min_map_qual_arg = args.min_map_qual ? ",min_map_qual = $args.min_map_qual" : ""
def ref_fasta_arg = args.ref_fasta ? ",ref.fasta = '$args.ref_fasta'" : ""
def skip_allele_counting_tumour_arg = args.skip_allele_counting_tumour ? ",skip_allele_counting_tumour = $args.skip_allele_counting_tumour" : ""
def skip_allele_counting_normal_arg = args.skip_allele_counting_normal ? ",skip_allele_counting_normal = $args.skip_allele_counting_normal" : ""



"""
#!/usr/bin/env Rscript
library(RColorBrewer)
library(ASCAT)
options(bitmapType='cairo')


#prepare from BAM files
ascat.prepareHTS(
tumourseqfile = "$input_tumor",
normalseqfile = "$input_normal",
tumourname = "Tumour",
normalname = "Normal",
allelecounter_exe = "alleleCounter",
alleles.prefix = "$allele_files",
loci.prefix = "$loci_files",
gender = "$gender",
genomeVersion = "$genomeVersion",
nthreads = $task.cpus
$minCounts_arg
$chrom_names_arg
$min_base_qual_arg
$min_map_qual_arg
$ref_fasta_arg
$skip_allele_counting_tumour_arg
$skip_allele_counting_normal_arg
)


#Load the data
ascat.bc = ascat.loadData(
Tumor_LogR_file = "Tumour_tumourLogR.txt",
lassefolkersen marked this conversation as resolved.
Show resolved Hide resolved
Tumor_BAF_file = "Tumour_normalBAF.txt",
Germline_LogR_file = "Tumour_normalLogR.txt",
Germline_BAF_file = "Tumour_normalBAF.txt",
genomeVersion = "$genomeVersion",
gender = "$gender"
)

#optional GC wave correction
if(!is.null($gc_files)){
ascat.bc = ascat.GCcorrect(ascat.bc, $gc_files)
}

#Plot the raw data
ascat.plotRawData(ascat.bc)

#Segment the data
ascat.bc = ascat.aspcf(ascat.bc)

#Plot the segmented data
ascat.plotSegmentedData(ascat.bc)

#Run ASCAT to fit every tumor to a model, inferring ploidy, normal cell contamination, and discrete copy numbers
#If psi and rho are manually set:
if (!is.null($purity) && !is.null($ploidy)){
ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity, psi_manual=$ploidy)
} else if(!is.null($purity) && is.null($ploidy)){
ascat.output <- ascat.runAscat(ascat.bc, gamma=1, rho_manual=$purity)
} else if(!is.null($ploidy) && is.null($purity)){
ascat.output <- ascat.runAscat(ascat.bc, gamma=1, psi_manual=$ploidy)
} else {
ascat.output <- ascat.runAscat(ascat.bc, gamma=1)
}

#Write out segmented regions (including regions with one copy of each allele)
write.table(ascat.output[["segments"]], file=paste0("$prefix", ".segments.txt"), sep="\t", quote=F, row.names=F)

#Write out CNVs in bed format
cnvs=ascat.output[["segments"]][2:6]
write.table(cnvs, file=paste0("$prefix",".cnvs.txt"), sep="\t", quote=F, row.names=F, col.names=T)

#Write out purity and ploidy info
summary <- tryCatch({
matrix(c(ascat.output[["aberrantcellfraction"]], ascat.output[["ploidy"]]), ncol=2, byrow=TRUE)}, error = function(err) {
# error handler picks up where error was generated
print(paste("Could not find optimal solution: ",err))
return(matrix(c(0,0),nrow=1,ncol=2,byrow = TRUE))
}
)
colnames(summary) <- c("AberrantCellFraction","Ploidy")
write.table(summary, file=paste0("$prefix",".purityploidy.txt"), sep="\t", quote=F, row.names=F, col.names=T)

#version export. Have to hardcode process name and software name because
#won't run inside an R-block
version_file_path="versions.yml"
f <- file(version_file_path,"w")
writeLines("ASCAT:", f)
writeLines(" ascat: 3.0.0",f)
FriederikeHanssen marked this conversation as resolved.
Show resolved Hide resolved
close(f)
"""


stub:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}.cnvs.txt
touch ${prefix}.purityploidy.txt
touch ${prefix}.segments.txt
touch Tumour.ASCATprofile.png
touch Tumour.ASPCF.png
touch Tumour.germline.png
touch Tumour.rawprofile.png
touch Tumour.sunrise.png
touch Tumour.tumour.png

echo 'ASCAT:' > versions.yml
echo ' ascat: 3.0.0' >> versions.yml
"""


}
92 changes: 92 additions & 0 deletions modules/ascat/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
name: ascat
description: copy number profiles of tumour cells.
keywords:
- sort
tools:
- ascat:
description: ASCAT is a method to derive copy number profiles of tumour cells, accounting for normal cell admixture and tumour aneuploidy. ASCAT infers tumour purity (the fraction of tumour cells) and ploidy (the amount of DNA per tumour cell), expressed as multiples of haploid genomes from SNP array or massively parallel sequencing data, and calculates whole-genome allele-specific copy number profiles (the number of copies of both parental alleles for all SNP loci across the genome).
homepage: None
documentation: None
tool_dev_url: https://github.com/Crick-CancerGenomics/ascat
doi: "10.1093/bioinformatics/btaa538"
licence: ['GPL v3']

input:
lassefolkersen marked this conversation as resolved.
Show resolved Hide resolved
- args:
type: map
description: |
Groovy Map containing tool parameters. MUST follow the structure/keywords below and be provided via modules.config. Parameters must be set between quotes. <optional> parameters can be removed from the map, if they are not set. For default values, please check the documentation above.

```
{
[
"gender": "XX",
"genomeVersion": "hg19"
"purity": <optional>,
"ploidy": <optional>,
"gc_files": <optional>,
"minCounts": <optional>,
"chrom_names": <optional>,
"min_base_qual": <optional>,
"min_map_qual": <optional>,
"ref_fasta": <optional>,
"skip_allele_counting_tumour": <optional>,
"skip_allele_counting_normal": <optional>
]
}
```

- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- input_normal:
type: file
description: BAM/CRAM/SAM file
pattern: "*.{bam,cram,sam}"
- index_normal:
type: file
description: index for normal_bam
pattern: "*.{bai}"
- input_tumor:
type: file
description: BAM/CRAM/SAM file
pattern: "*.{bam,cram,sam}"
- index_tumor:
type: file
description: index for tumor_bam
pattern: "*.{bai}"
- allele_files:
type: file
description: allele files for ASCAT. Can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS
- loci_files:
type: file
description: loci files for ASCAT. Can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- png:
type: file
description: ASCAT plots
pattern: "*.{png}"
- purityploidy:
type: file
description: purity and ploidy data
pattern: "*.purityploidy.txt"
- segments:
type: file
description: segments data
pattern: "*.segments.txt"
authors:
- "@aasNGC"
- "@lassefolkersen"
- "@FriederikeHanssen"
- "@maxulysse"
4 changes: 4 additions & 0 deletions tests/config/pytest_modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ artic/minion:
- modules/artic/minion/**
- tests/modules/artic/minion/**

ascat:
- modules/ascat/**
- tests/modules/ascat/**

assemblyscan:
- modules/assemblyscan/**
- tests/modules/assemblyscan/**
Expand Down
64 changes: 64 additions & 0 deletions tests/modules/ascat/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { ASCAT as ASCAT_SIMPLE} from '../../../modules/ascat/main.nf'
include { ASCAT as ASCAT_PLOIDY_AND_PURITY} from '../../../modules/ascat/main.nf'
include { ASCAT as ASCAT_CRAM} from '../../../modules/ascat/main.nf'




workflow test_ascat {
input = [
[ id:'test', single_end:false ], // meta map
file(params.test_data['homo_sapiens']['illumina']['test_paired_end_sorted_bam'], checkIfExists: true),
file(params.test_data['homo_sapiens']['illumina']['test_paired_end_sorted_bam_bai'], checkIfExists: true),
file(params.test_data['homo_sapiens']['illumina']['test2_paired_end_sorted_bam'], checkIfExists: true),
file(params.test_data['homo_sapiens']['illumina']['test2_paired_end_sorted_bam_bai'], checkIfExists: true)
]

ASCAT_SIMPLE ( input , [], [])
}





// extended tests running with 1000 genomes data. Data is downloaded as follows:
lassefolkersen marked this conversation as resolved.
Show resolved Hide resolved
// wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00154/alignment/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
// wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00154/alignment/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
// wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00155/alignment/HG00155.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam
// wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/data/HG00155/alignment/HG00155.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai
//workflow test_ascat_with_ploidy_and_purity {
lassefolkersen marked this conversation as resolved.
Show resolved Hide resolved
// input = [
// [ id:'test', single_end:false ], // meta map
// file("/home/ec2-user/input_files/bams/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam", checkIfExists: true),
// file("/home/ec2-user/input_files/bams/HG00154.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam.bai", checkIfExists: true),
// file("/home/ec2-user/input_files/bams/test2.bam", checkIfExists: true),
// file("/home/ec2-user/input_files/bams/test2.bam.bai", checkIfExists: true)
// ]
//
// ASCAT_PLOIDY_AND_PURITY ( input , "/home/ec2-user/input_files/allele_files/G1000_alleles_hg19_chr", "/home/ec2-user/input_files/loci_files/G1000_alleles_hg19_chr")
//}


// extended tests running with 1000 genomes data. Data is downloaded as follows:
// wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00145/alignment/HG00145.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram.crai
// wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00145/alignment/HG00145.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram
// wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00146/alignment/HG00146.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram.crai
// wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00146/alignment/HG00146.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram
//workflow test_ascat_with_crams {
// input = [
// [ id:'test', single_end:false ], // meta map
// file("/home/ec2-user/input_files/crams/HG00145.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram", checkIfExists: true),
// file("/home/ec2-user/input_files/crams/HG00145.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram.crai", checkIfExists: true),
// file("/home/ec2-user/input_files/crams/duplicate_test.cram", checkIfExists: true),
// file("/home/ec2-user/input_files/crams/duplicate_test.cram.crai", checkIfExists: true)
// ]
//
// ASCAT_CRAM ( input , "/home/ec2-user/input_files/allele_files/G1000_alleles_hg19_chr", "/home/ec2-user/input_files/loci_files/G1000_alleles_hg19_chr")
//}



39 changes: 39 additions & 0 deletions tests/modules/ascat/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
process {

publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }


withName: ASCAT_SIMPLE {
ext.args = [
gender : 'XY',
genomeVersion : 'hg19',
minCounts : '1',
min_base_qual : '1',
min_map_qual : '1',
chrom_names : 'c("21","22")'
]
}



withName: ASCAT_PLOIDY_AND_PURITY {
ext.args = [
gender : 'XX',
genomeVersion : 'hg19',
ploidy : '1.7',
purity : '0.24',
chrom_names : 'c("21","22")'
]
}

withName: ASCAT_CRAM {
ext.args = [
gender : 'XX',
genomeVersion : 'hg19',
ref_fasta : '/home/ec2-user/input_files/fasta/human_g1k_v37.fasta',
chrom_names : 'c("21","22")'
]
}

}

25 changes: 25 additions & 0 deletions tests/modules/ascat/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
- name: ascat test_ascat
command: nextflow run tests/modules/ascat -entry test_ascat -c tests/config/nextflow.config -stub-run
tags:
- ascat
files:
- path: output/ascat/Tumour.ASCATprofile.png
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/Tumour.ASPCF.png
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/Tumour.germline.png
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/Tumour.rawprofile.png
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/Tumour.sunrise.png
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/Tumour.tumour.png
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/test.cnvs.txt
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/test.purityploidy.txt
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/test.segments.txt
md5sum: d41d8cd98f00b204e9800998ecf8427e
- path: output/ascat/versions.yml
md5sum: 1af20694ec11004c4f8bc0c609b06386