Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds modules nextflow pseudocode #79

Merged
merged 40 commits into from
Nov 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
5080c4d
Adds nf-core template for nextflow pips
cgpu Nov 7, 2020
2998236
Absorbs dev latest changes
cgpu Nov 7, 2020
137aa41
Cleans up template main.nf and adds swag cli message
cgpu Nov 7, 2020
eac977e
Updates nextflow.config
cgpu Nov 7, 2020
6bcc81a
Adds Dockerfile and env yaml updates
cgpu Nov 7, 2020
9075836
Removes redundant files from assets
cgpu Nov 7, 2020
780a115
Deleted nf schema json
cgpu Nov 7, 2020
420340e
Removes redundant configs
cgpu Nov 7, 2020
4495b05
Updates README with template structure
cgpu Nov 7, 2020
2a785f3
Updates docs/
cgpu Nov 7, 2020
75571d3
Updates repo name in changelog
cgpu Nov 7, 2020
492d1ae
Updates template test.config
cgpu Nov 7, 2020
da3687d
Adds bin folder and template wrapper R script
cgpu Nov 7, 2020
35de804
Adds pbccs in env.yml
cgpu Nov 7, 2020
5638c54
Changes the location of pipeline info, logs
cgpu Nov 7, 2020
8fdecfd
Adds .github folder
cgpu Nov 7, 2020
6f812e9
Merge branch 'adds-nextflow-boilerplate' of https://github.com/sheynk…
cgpu Nov 7, 2020
6fcae1c
Removes redendant files from GH actions
cgpu Nov 7, 2020
2ae1719
Updates CONTRIBUTING.md
cgpu Nov 7, 2020
dccbaad
Updates ISSUE_TEMPLATE
cgpu Nov 7, 2020
e5c1ab3
Update PULL_REQUEST_TEMPLATE.md
cgpu Nov 7, 2020
5fb8e1b
Removes AWS tests
cgpu Nov 7, 2020
d7da707
Adds misspelling test
cgpu Nov 7, 2020
91ea8d9
Removes linting.yml
cgpu Nov 7, 2020
2ae3e35
Corrects typo
cgpu Nov 7, 2020
10f556e
Removes igenomes config
cgpu Nov 7, 2020
02fb11d
Merge branch 'adds-nextflow-boilerplate' of https://github.com/sheynk…
cgpu Nov 7, 2020
31744bb
Fixes typos caught by review-dog
cgpu Nov 8, 2020
2388635
Adds tentative LICENSE
cgpu Nov 8, 2020
50b8058
Adds environment.yml with pandas, numpy, biopython
cgpu Nov 8, 2020
7672e16
Adds CCS process
cgpu Nov 8, 2020
f04dc7c
Adds pbbam (required for ccs --chunk subsequent routine)
cgpu Nov 8, 2020
df6bd40
Adds pbindex, ccs processes (w/ parallel --chunks)
cgpu Nov 8, 2020
f9b6153
Removes redundant bai (pbi is needed)
cgpu Nov 9, 2020
76ab7a8
Adds temp process mock ccs and flag for testing
cgpu Nov 9, 2020
e860d54
Absorbs latest changes from dev
cgpu Nov 24, 2020
5120695
Corrects typo caught by reviewdog gh-action
cgpu Nov 24, 2020
34e3db5
Typo fix
cgpu Nov 24, 2020
c5c8317
Deletes commented out section
cgpu Nov 24, 2020
38a5079
Makes the section note more informative
cgpu Nov 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run sheynkman-lab/Long-Read-Proteogenomics -profile test,<docker/singularity>
*/

params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = 6.GB
max_time = 48.h

// Input data
}
33 changes: 26 additions & 7 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,30 @@ channels:
- bioconda
- defaults
dependencies:
- conda-forge::python=3.7.3
- conda-forge::markdown=3.1.1
- conda-forge::pymdown-extensions=6.0
- conda-forge::pygments=2.5.2
- bioconda::fastqc=0.11.8
- bioconda::multiqc=1.7
# General utils
- python=3.7.3
- markdown=3.1.1
- pymdown-extensions=6.0
- pygments=2.5.2
- multiqc=1.7
- biopython
# Module 1: SMARTLink - CCS
- pbccs
-
- pbbam
# Module 2: Iso-Seq 3
- isoseq3
- lima
- pbmm2
- pbcoretools
- bamtools
# Module 3: SQANTI3, separate docker image
# Module 4: CPAT
# Module 5: 6 Frame Translation
- biopython
# Module 6: Transcriptome Summary
- numpy
- pandas
# Module 7: ORF Calling
# Module 8: Refined Db Generation
# Module 9: Db Annotation
# Module 10: MetaMorpheus
120 changes: 120 additions & 0 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,38 @@ summary['Config Profile'] = workflow.profile
log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n")
log.info "-\033[2m--------------------------------------------------\033[0m-"

/*
* Configuring channels based on input parameters
*/

// Fail early: Nothing to analyze if the user does not provide an input pb_bams_folder
if (!params.pb_bams_folder ) {
exit 1, "Please provide an input folder with --pb_bams_folder to proceed, see --help for more information"
}

if (params.pb_bams_folder && hasExtension(params.pb_bams_folder, "tar.gz")) {
ch_pb_bams_folder_tar_gz = Channel.fromPath(params.pb_bams_folder)
}

if (params.pb_bams_folder && !hasExtension(params.pb_bams_folder, "tar.gz")) {
// ch_pb_bams_folder = params.pb_bams_folder ? Channel.fromFilePairs("${params.pb_bams_folder}/*.{bam,${params.bai_suffix}}", flat: true) : null
ch_pb_bams_folder = params.pb_bams_folder ? Channel.fromPath("${params.pb_bams_folder}/*.bam") : null
}

// If the user has provided input folder
if (params.pb_bams_folder ) {
ch_pb_bams_folder
.set { ch_pb_subreads_bams }
}

(ch_pb_subreads_bams_for_pbi,
ch_pb_subreads_bams_to_display) = ch_pb_subreads_bams.into(2)

ch_pb_subreads_bams_to_display.view()

ch_ccs_chunks = Channel.from(1.."${params.number_of_ccs_chunks}".toInteger())
(ch_ccs_chunks, ch_ccs_chunks_to_display) = ch_ccs_chunks.into(2)

/*
* STEP - validate template
*/
Expand All @@ -90,6 +122,86 @@ process validate {
"""
}

/*
* Module 1: SMARTLink - CCS
*/

// Generate pbi index required for using the ccs --chunk parallelisation
process generate_pbi {
tag "${pb_subreads_bam.simpleName}"
cpus 1
echo true

input:
file(pb_subreads_bam) from ch_pb_subreads_bams_for_pbi

output:
set val("${pb_subreads_bam.simpleName}"),
file("${pb_subreads_bam.baseName}.bam"),
file("${pb_subreads_bam.baseName}.bam.pbi") into ch_pb_subreads_bams_for_ccs

script:
"""
pbindex ${pb_subreads_bam}
"""
}

ch_ccs_chucked_bams = ch_ccs_chunks.combine(ch_pb_subreads_bams_for_ccs)

if (!params.mock_ccs) {
process smartlink_ccs {
tag "sample:${sample},chunk:${ith_chunk}"
publishDir "${params.outdir}/smartlink_ccs/", mode: params.publish_dir_mode
cpus 1

input:
set val(ith_chunk), val(sample), file(pb_subreads_bam), file(pb_subreads_bai) from ch_ccs_chucked_bams

output:
set val("${sample}"),
file("${sample}.ccs.${ith_chunk}.bam"),
file("${sample}.ccs.${ith_chunk}.bam.pbi") into ch_ccs_pacbio_bams

script:
// Hardcoded example from docs:
// ccs movie.subreads.bam movie.ccs.1.bam --chunk 1/10 -j <THREADS>
"""
ccs ${pb_subreads_bam} ${sample}.ccs.${ith_chunk}.bam --chunk ${ith_chunk}/${params.number_of_ccs_chunks} -j ${task.cpus}
"""
}
}

if (params.mock_ccs) {
process smartlink_ccs_mock {
tag "sample:${sample},chunk:${ith_chunk}"
publishDir "${params.outdir}/smartlink_ccs/", mode: params.publish_dir_mode
cpus 1

input:
set val(ith_chunk), val(sample), file(pb_subreads_bam), file(pb_subreads_bai) from ch_ccs_chucked_bams

output:
set val("${sample}"),
file("${sample}.ccs.${ith_chunk}.bam"),
file("${sample}.ccs.${ith_chunk}.bam.pbi") into ch_ccs_pacbio_bams

script:
// Hardcoded example from docs:
// ccs movie.subreads.bam movie.ccs.1.bam --chunk 1/10 -j <THREADS>
"""
# ccs ${pb_subreads_bam} ${sample}.ccs.${ith_chunk}.bam --chunk ${ith_chunk}/${params.number_of_ccs_chunks} -j ${task.cpus}
touch ${sample}.ccs.${ith_chunk}.bam ${sample}.ccs.${ith_chunk}.bam.bai ${sample}.ccs.${ith_chunk}.bam.pbi
"""
}
}








def logHeader() {
// Log colors ANSI codes
c_black = params.monochrome_logs ? '' : "\033[0;30m";
Expand Down Expand Up @@ -128,3 +240,11 @@ def logHeader() {
-${c_dim}--------------------------------------------------${c_reset}-
""".stripIndent()
}

// Functions
// Credits for most of the functions to https://github.com/nf-core/sarek developers

// Check file extension
def hasExtension(it, extension) {
it.toString().toLowerCase().endsWith(extension.toLowerCase())
}
14 changes: 12 additions & 2 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,21 @@ params {
max_cpus = 16
max_time = 240.h

// Module 1: SMARTLink - CCS
bai_suffix = 'bam.bai' // CAUTION: be sure that you declare bam.bai or .bai explicitly
pb_bams_folder = 'testdata'
number_of_ccs_chunks = 10

}

// Container slug. Stable releases should specify release tag!
// Developmental code should specify :dev
process.container = 'sheynkmanlab/proteogenomics-base:dev'

docker.enabled = true

process {
container = 'cgpu/proteogenomics:1.0dev'
}

profiles {
docker {
Expand All @@ -37,7 +47,7 @@ profiles {
podman {
podman.enabled = true
}
test { includeConfig 'conf/executors/test.config' }
test { includeConfig 'conf/test.config' }
}

// Export these variables to prevent local Python/R libraries from conflicting with those in the container
Expand Down