Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Repeatmasking workflow #198

Merged
merged 8 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions workflows/repeatmasking/.dockstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /RepeatMasking-Workflow.ga
testParameterFiles:
- /RepeatMasking-Workflow-tests.yml
authors:
- name: Romane Libouban
email: [email protected]
5 changes: 5 additions & 0 deletions workflows/repeatmasking/.workflowhub.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
version: '0.1'
registries:
- url: https://workflowhub.eu
project: iwc
workflow: RepeatMasking-Workflow./main
5 changes: 5 additions & 0 deletions workflows/repeatmasking/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog

## [0.1]

Initial version of the RepeatMasking workflow for genomic sequencing data.
29 changes: 29 additions & 0 deletions workflows/repeatmasking/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# RepeatMasking Workflow

This workflow uses RepeatModeler and RepeatMasker for genome analysis.

- RepeatModeler is a software package for identifying and modeling de novo families of transposable elements (TEs). At the heart of RepeatModeler are three de novo repeat search programs (RECON, RepeatScout and LtrHarvest/Ltr_retriever) which use complementary computational methods to identify repeat element boundaries and family relationships from sequence data.

- RepeatMasker is a program that analyzes DNA sequences for *interleaved repeats* and *low-complexity* DNA sequences. The result of the program is a detailed annotation of the repeats present in the query sequence, as well as a modified version of the query sequence in which all annotated repeats are present.

## Input dataset for RepeatModeler
- RepeatModeler requires a single input file, a genome in fasta format.


## Outputs dataset for RepeatModeler
- Two output files are generated:
- summary file (.tbl)
- fasta file containing alignments in order of appearance in the query sequence


## Input dataset for RepeatMasker
- ReapatMasker requires the fasta file generated by RepeatModeler

## Outputs datasets for RepeatMasker
- Five output files are generated:
- a fasta file
- .gff3 file
- a table summarizing the repeated content of the sequence analyzed
- a file with statistics related to the repeated content of the sequence analyzed
- a summary of the mutation sites found and the order of grouping

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
- doc: Test outline for Repeat-masking-with-RepeatModeler-and-RepeatMasker
job:
input:
class: File
path: test-data/input.fasta
filetype: fasta
outputs:
RepeatMasker masked genome:
path: test-data/RepeatMasker masked genome.fasta
RepeatMasker output log:
path: test-data/RepeatMasker output log.tabular
RepeatMasker repeat statistics:
path: test-data/RepeatMasker repeat statistics.txt
RepeatMasker repeat catalog:
path: test-data/RepeatMasker repeat catalog.txt
RepeatMasker repeat annotation:
path: test-data/RepeatMasker repeat annotation.gff
RepeatModeler consensus sequences:
path: test-data/RepeatModeler consensus sequences.fasta
RepeatModeler seeds alignments:
path: test-data/RepeatModeler seeds alignments.stockholm
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"a_galaxy_workflow": "true", "annotation": "", "creator": [{"class": "Person", "email": "mailto:[email protected]", "name": "Romane Libouban"}], "format-version": "0.1", "license": "MIT", "name": "Repeat masking with RepeatModeler and RepeatMasker", "steps": {"0": {"annotation": "", "content_id": null, "errors": null, "id": 0, "input_connections": {}, "inputs": [{"description": "", "name": "input"}], "label": "input", "name": "Input dataset", "outputs": [], "position": {"left": 0, "top": 0}, "tool_id": null, "tool_state": "{\"optional\": false, \"tag\": null}", "tool_version": null, "type": "data_input", "uuid": "ab5e19b0-ce35-4e54-a55e-f75243c86e3d", "when": null, "workflow_outputs": []}, "1": {"annotation": "", "content_id": "toolshed.g2.bx.psu.edu/repos/csbl/repeatmodeler/repeatmodeler/2.0.4+galaxy1", "errors": null, "id": 1, "input_connections": {"input_file": {"id": 0, "output_name": "output"}}, "inputs": [], "label": null, "name": "RepeatModeler", "outputs": [{"name": "sequences", "type": "fasta"}, {"name": "seeds", "type": "stockholm"}], "position": {"left": 220, "top": 0}, "post_job_actions": {}, "tool_id": "toolshed.g2.bx.psu.edu/repos/csbl/repeatmodeler/repeatmodeler/2.0.4+galaxy1", "tool_shed_repository": {"changeset_revision": "8661b2607b7e", "name": "repeatmodeler", "owner": "csbl", "tool_shed": "toolshed.g2.bx.psu.edu"}, "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/shared/ifbstor1/galaxy/mutable-config/tool-data/shared/ucsc/chrom/?.len\", \"input_file\": null, \"__page__\": null, \"__rerun_remap_job_id__\": null}", "tool_version": "2.0.4+galaxy1", "type": "tool", "uuid": "9312ba36-4275-4d40-8ba6-95eea1b23b11", "when": null, "workflow_outputs": [{"label": "RepeatModeler seeds alignments", "output_name": "seeds", "uuid": "b5cd1bca-27c7-4887-904d-f59084d68291"}, {"label": "RepeatModeler consensus sequences", "output_name": "sequences", "uuid": "65e194d1-db7c-4783-97c4-d5051e4b64ea"}]}, "2": {"annotation": "", "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.5+galaxy0", "errors": null, "id": 2, "input_connections": {"input_fasta": {"id": 1, "output_name": "sequences"}}, "inputs": [], "label": null, "name": "RepeatMasker", "outputs": [{"name": "output_masked_genome", "type": "fasta"}, {"name": "output_log", "type": "tabular"}, {"name": "output_table", "type": "txt"}, {"name": "output_repeat_catalog", "type": "txt"}, {"name": "output_gff", "type": "gff"}], "position": {"left": 440, "top": 0}, "post_job_actions": {}, "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.5+galaxy0", "tool_shed_repository": {"changeset_revision": "ba6d2c32f797", "name": "repeat_masker", "owner": "bgruening", "tool_shed": "toolshed.g2.bx.psu.edu"}, "tool_state": "{\"__input_ext\": \"input\", \"advanced\": {\"is_only\": false, \"is_clip\": false, \"no_is\": false, \"rodspec\": false, \"primspec\": false, \"nolow\": false, \"noint\": false, \"norna\": false, \"alu\": false, \"div\": false, \"search_speed\": \"\", \"frag\": \"40000\", \"gc\": null, \"gccalc\": false, \"nocut\": false, \"xout\": false, \"keep_alignments\": false, \"invert_alignments\": false, \"poly\": false}, \"chromInfo\": \"/shared/ifbstor1/galaxy/mutable-config/tool-data/shared/ucsc/chrom/?.len\", \"excln\": true, \"gff\": true, \"input_fasta\": null, \"repeat_source\": {\"source_type\": \"dfam\", \"__current_case__\": 0, \"species_source\": {\"species_from_list\": \"no\", \"__current_case__\": 1, \"species_name\": \"\"}}, \"xsmall\": true, \"__page__\": null, \"__rerun_remap_job_id__\": null}", "tool_version": "4.1.5+galaxy0", "type": "tool", "uuid": "e6c8e6a1-efe8-4291-b12b-5fdb3795b6ca", "when": null, "workflow_outputs": [{"label": "RepeatMasker output log", "output_name": "output_log", "uuid": "8b69884b-5990-48f6-aa80-cf705586b313"}, {"label": "RepeatMasker repeat annotation", "output_name": "output_gff", "uuid": "6b96cc61-19be-4abd-b99d-7639ef6ed51f"}, {"label": "RepeatMasker masked genome", "output_name": "output_masked_genome", "uuid": "b0e0f699-3085-424d-82f0-b9c57361e8bb"}, {"label": "RepeatMasker repeat catalog", "output_name": "output_repeat_catalog", "uuid": "02b790ca-b3e7-441c-baf0-f3c1942cb8ca"}, {"label": "RepeatMasker repeat statistics", "output_name": "output_table", "uuid": "eb04a2c3-2461-4f8f-9f3f-c42143f79904"}]}}, "tags": [], "uuid": "f25be8fa-7823-456f-9707-a497703f48d7", "version": 0}
abretaud marked this conversation as resolved.
Show resolved Hide resolved
42 changes: 42 additions & 0 deletions workflows/repeatmasking/RepeatMasking-Workflow-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
- doc: Test outline for RepeatMasking Workflow
job:
input:
class: File
path: test-data/eco.fasta
filetype: fasta

outputs:
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
RepeatModeler consensus sequences:
path: test-data/repeatmodeler_output_sequences.fasta
compare: sim_size
delta: 30000

RepeatModeler seeds alignments:
path: test-data/repeatmodeler_output_seeds.stockholm
compare: sim_size
delta: 90000000

RepeatMasker masked genome:
path: test-data/repeatmasker_output_masked_genome.fasta
compare: sim_size
delta: 30000

RepeatMasker output log:
path: test-data/repeatmasker_output_log.tabular
compare: sim_size
delta: 30000

RepeatMasker repeat statistics:
path: test-data/repeatmasker_output_table.txt
compare: sim_size
delta: 30000

RepeatMasker repeat catalog:
path: test-data/repeatmasker_output_repeat_catalog.txt
compare: sim_size
delta: 30000

RepeatMasker repeat annotation:
path: test-data/repeatmasker_output_gff.gff
compare: sim_size
delta: 30000
175 changes: 175 additions & 0 deletions workflows/repeatmasking/RepeatMasking-Workflow.ga
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
{
"a_galaxy_workflow": "true",
"annotation": "",
"format-version": "0.1",
"license": "MIT",
"release": "0.1",
"name": "Repeat masking with RepeatModeler and RepeatMasker",
"creator": [
{
"class": "Person",
"email": "mailto:[email protected]",
"name": "Romane Libouban"
}
],
"steps": {
"0": {
"annotation": "",
"content_id": null,
"errors": null,
"id": 0,
"input_connections": {},
"inputs": [
{
"description": "Apply repeat masking to this fasta file",
"name": "input"
}
],
"label": "input",
"name": "Input dataset",
"outputs": [],
"position": {
"left": 10,
"top": 10
},
"tool_id": null,
"tool_state": "{\"optional\": false, \"tag\": null}",
"tool_version": null,
"type": "data_input",
"uuid": "ab5e19b0-ce35-4e54-a55e-f75243c86e3d",
"when": null,
"workflow_outputs": []
},
"1": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/csbl/repeatmodeler/repeatmodeler/2.0.4+galaxy1",
"errors": null,
"id": 1,
"input_connections": {
"input_file": {
"id": 0,
"output_name": "output"
}
},
"inputs": [],
"label": null,
"name": "RepeatModeler",
"outputs": [
{
"name": "sequences",
"type": "fasta"
},
{
"name": "seeds",
"type": "stockholm"
}
],
"position": {
"left": 230,
"top": 10
},
"post_job_actions": {},
"tool_id": "toolshed.g2.bx.psu.edu/repos/csbl/repeatmodeler/repeatmodeler/2.0.4+galaxy1",
"tool_shed_repository": {
"changeset_revision": "8661b2607b7e",
"name": "repeatmodeler",
"owner": "csbl",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/shared/ifbstor1/galaxy/mutable-config/tool-data/shared/ucsc/chrom/?.len\", \"input_file\": null, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "2.0.4+galaxy1",
"type": "tool",
"uuid": "9312ba36-4275-4d40-8ba6-95eea1b23b11",
"when": null,
"workflow_outputs": [
{
"output_name": "sequences",
"label": "RepeatModeler consensus sequences"
},
{
"output_name": "seeds",
"label": "RepeatModeler seeds alignments"
}
]
},
"2": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.5+galaxy0",
"errors": null,
"id": 2,
"input_connections": {
"input_fasta": {
"id": 1,
"output_name": "sequences"
}
},
"inputs": [],
"label": null,
"name": "RepeatMasker",
"outputs": [
{
"name": "output_masked_genome",
"type": "fasta"
},
{
"name": "output_log",
"type": "tabular"
},
{
"name": "output_table",
"type": "txt"
},
{
"name": "output_repeat_catalog",
"type": "txt"
},
{
"name": "output_gff",
"type": "gff"
}
],
"position": {
"left": 450,
"top": 10
},
"post_job_actions": {},
"tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.5+galaxy0",
"tool_shed_repository": {
"changeset_revision": "ba6d2c32f797",
"name": "repeat_masker",
"owner": "bgruening",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"__input_ext\": \"input\", \"advanced\": {\"is_only\": false, \"is_clip\": false, \"no_is\": false, \"rodspec\": false, \"primspec\": false, \"nolow\": false, \"noint\": false, \"norna\": false, \"alu\": false, \"div\": false, \"search_speed\": \"\", \"frag\": \"40000\", \"gc\": null, \"gccalc\": false, \"nocut\": false, \"xout\": false, \"keep_alignments\": false, \"invert_alignments\": false, \"poly\": false}, \"chromInfo\": \"/shared/ifbstor1/galaxy/mutable-config/tool-data/shared/ucsc/chrom/?.len\", \"excln\": true, \"gff\": true, \"input_fasta\": null, \"repeat_source\": {\"source_type\": \"dfam\", \"__current_case__\": 0, \"species_source\": {\"species_from_list\": \"no\", \"__current_case__\": 1, \"species_name\": \"\"}}, \"xsmall\": true, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "4.1.5+galaxy0",
"type": "tool",
"uuid": "e6c8e6a1-efe8-4291-b12b-5fdb3795b6ca",
"when": null,
"workflow_outputs": [
{
"output_name": "output_masked_genome",
"label": "RepeatMasker masked genome"
},
{
"output_name": "output_log",
"label": "RepeatMasker output log"
},
{
"output_name": "output_table",
"label": "RepeatMasker repeat statistics"
},
{
"output_name": "output_repeat_catalog",
"label": "RepeatMasker repeat catalog"
},
{
"output_name": "output_gff",
"label": "RepeatMasker repeat annotation"
}
]
}
},
"tags": [],
"uuid": "f25be8fa-7823-456f-9707-a497703f48d7",
"version": 0
}
Loading