Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sortgff and gff3trimfasta modules #64

Open
wants to merge 3 commits into
base: jbrowse_indexer
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions modules/ebi-metagenomics/gff3trimfasta/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
process GFF3_TRIM_FASTA {
tag "$meta.id"
label 'process_single'

container 'quay.io/biocontainers/gawk:4.1.3--0'

input:
tuple val(meta), path(tab)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta), path(tab)
tuple val(meta), path(gff)


output:
tuple val(meta), path("*._trimmed.gff"), optional: true, emit: gff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta), path("*._trimmed.gff"), optional: true, emit: gff
tuple val(meta), path("*._trimmed.gff"), optional: true, emit: trimmed_gff

path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
"""

awk '/##FASTA/{exit}1' "$tab" > "${prefix}_trimmed.gff"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
awk: \$(awk --version 2>&1 | grep -o '[0-9]\{8\}')
END_VERSIONS
"""

stub:
"""
touch "${prefix}_trimmed.gff"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
awk: \$(awk --version 2>&1 | grep -o '[0-9]\{8\}')
END_VERSIONS
"""
}
48 changes: 48 additions & 0 deletions modules/ebi-metagenomics/gff3trimfasta/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: gff3trimfasta
description: Trim the FASTA section from a GFF3 file
keywords:
- trim
- gff
- fasta
- awk
tools:
- awk:
description: A program that you can use to select particular records in a file and perform operations upon them.
homepage: https://www.gnu.org/software/gawk/
documentation: https://www.gnu.org/software/gawk/manual/gawk.html
licence: ["GPL-3.0-or-later"]
identifier: biotools:awk
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- tab:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tab... gff seems more appropiate as a the name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gff is used for output file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorterd_gff should be used for the gff then

type: file
description: |
GFF3 file that includes a FASTA section to be trimmed
pattern: "*.{gff,gff3}"
output:
- gff:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*_trimmed.gff":
type: file
description: GFF file with the FASTA section trimmed
pattern: "*_trimmed.gff"
- versions:
- versions.yml:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@SantiagoSanchezF"
- "@tgurbich"
- "@vikasguptaebi"
maintainers:
- "@tgurbich"
- "@vikasguptaebi"
40 changes: 40 additions & 0 deletions modules/ebi-metagenomics/jbrowse/sortgff/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
process SORT_GFF {
tag "$meta.id"
label 'process_single'

conda "${moduleDir}/environment.yml"
container 'quay.io/biocontainers/coreutils:8.25--0'

input:
tuple val(meta), path(tab)

output:
tuple val(meta), path("*._sorted.gff"), optional:true, emit: gff
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"

SantiagoSanchezF marked this conversation as resolved.
Show resolved Hide resolved
"""
(grep "^#" $tab; grep -v "^#" $tab | sort -t"`printf '\t'`" -k1,1 -k4,4n) > ${prefix}.sorted.gff;

cat <<-END_VERSIONS > versions.yml
"${task.process}":
grep: \$(grep --version| awk '{print $NF}')
END_VERSIONS
"""

stub:
"""
touch ${prefix}.sorted.gff

cat <<-END_VERSIONS > versions.yml
"${task.process}":
grep: \$(grep --version| awk '{print $NF}')
END_VERSIONS
"""
}
54 changes: 54 additions & 0 deletions modules/ebi-metagenomics/jbrowse/sortgff/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: sortgff
description: sort gff file for jbrowse use
keywords:
- gff
- jbrowse
- sort
- grep
tools:
- grep:
description: Print lines matching a pattern
homepage: https://www.gnu.org/software/grep/
documentation: https://www.gnu.org/software/grep/manual/
licence: ["GPL-3.0-or-later"]
identifier: biotools:grep
- sort:
description: Sort lines of text files
homepage: https://www.gnu.org/software/coreutils/
documentation: https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html
licence: ["GPL-3.0-or-later"]
identifier: biotools:sort
input:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- tab:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use gff instead of tab

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave is as tab, as I'm using gff for the output. Changed the descrition to be more specific

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be gff, tab doesn't really convey what the input is... I would change the output to tbe sorted_gff or similar

type: file
description: |
GFF file containig genomic annotations
pattern: "*.{gff,gff3}"
output:
- gff:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- "*.sorted.gff":
type: file
description: Sorted GFF file
pattern: "*.sorted.gff"
- versions:
- versions.yml:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@SantiagoSanchezF"
- "@tgurbich"
- "@vikasguptaebi"
maintainers:
- "@tgurbich"
- "@vikasguptaebi"