Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Repeatmasking workflow #198

Merged
merged 8 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions workflows/repeatmasking/.dockstore.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /RepeatMasking-Workflow.ga
testParameterFiles:
- /RepeatMasking-Workflow-tests.yml
authors:
- name: Romane Libouban
email: mailto:[email protected]
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 5 additions & 0 deletions workflows/repeatmasking/.workflowhub.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
version: '0.1'
registries:
- url: https://workflowhub.eu
project: iwc
workflow: RepeatMasking-Workflow./main
5 changes: 5 additions & 0 deletions workflows/repeatmasking/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog

## [0.1]

Initial version of the RepeatMasking workflow for genomic sequencing data.
29 changes: 29 additions & 0 deletions workflows/repeatmasking/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# RepeatMasking Workflow

This workflow uses RepeatModeler and RepeatMasker for genome analysis.

- RepeatModeler is a software package for identifying and modeling de novo families of transposable elements (TEs). At the heart of RepeatModeler are three de novo repeat search programs (RECON, RepeatScout and LtrHarvest/Ltr_retriever) which use complementary computational methods to identify repeat element boundaries and family relationships from sequence data.

- RepeatMasker is a program that analyzes DNA sequences for *interleaved repeats* and *low-complexity* DNA sequences. The result of the program is a detailed annotation of the repeats present in the query sequence, as well as a modified version of the query sequence in which all annotated repeats are present.

## Input dataset for ReapatModeler
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
- RepeatModeler requires a single input file, a genome in fasta format.


## Outputs dataset for ReapatModeler
- Two output files are generated:
- summary file (.tbl)
- fasta file containing alignments in order of appearance in the query sequence


## Input dataset for RepeatMasker
- ReapatMasker requires the fasta file generated by ReapatModeler

## Outputs datasets for RepeatMasker
- Five output files are generated:
- a fasta file
- .gff3 file
- a table summarizing the repeated content of the sequence analyzed
- a file with statistics related to the repeated content of the sequence analyzed
- a summary of the mutation sites found and the order of grouping

34 changes: 34 additions & 0 deletions workflows/repeatmasking/RepeatMasking-Workflow-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
- doc: Test outline for RepeatMasking Workflow
job:
sequence.fasta:
class: File
path: test-data/sequence.fasta
filetype: fasta

# inputs:
# sequence.fasta.fasta: sequence.fasta

outputs:
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
'sequences':
sequence:
path: test-data/repeatmodeler_output_sequences.fasta
compare: contents
seeds:
path: test-data/repeatmodeler_output_seeds.stockholm
compare: contents
'output_masked_genome':
output_masked_genome:
path: test-data/repeatmasker_output_masked_genome.fasta
compare: contents
output_log:
path: test-data/repeatmasker_output_log.tabular
compare: contents
output_table:
path: test-data/repeatmasker_output_table.txt
compare: contents
output_repeat_catalog:
path: test-data/repeatmasker_output_repeat_catalog.txt
compare: contents
output_gff:
path: test-data/repeatmasker_output_gff.gff
compare: contents
168 changes: 168 additions & 0 deletions workflows/repeatmasking/RepeatMasking-Workflow.ga
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
{
"a_galaxy_workflow": "true",
"annotation": "",
"format-version": "0.1",
"license": "MIT",
"release": "0.1",
"name": "Workflow constructed from history 'RepeatMasking'",
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
"steps": {
"0": {
"annotation": "",
"content_id": null,
"errors": null,
"id": 0,
"input_connections": {},
"inputs": [
{
"description": "",
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
"name": "sequence.fasta"
}
],
"label": "sequence.fasta",
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
"name": "Input dataset",
"outputs": [],
"position": {
"left": 10,
"top": 10
},
"tool_id": null,
"tool_state": "{\"optional\": false, \"tag\": null}",
"tool_version": null,
"type": "data_input",
"uuid": "ab5e19b0-ce35-4e54-a55e-f75243c86e3d",
"when": null,
"workflow_outputs": []
},
"1": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/csbl/repeatmodeler/repeatmodeler/2.0.4+galaxy1",
"errors": null,
"id": 1,
"input_connections": {
"input_file": {
"id": 0,
"output_name": "output"
}
},
"inputs": [],
"label": null,
"name": "RepeatModeler",
"outputs": [
{
"name": "sequences",
"type": "fasta"
},
{
"name": "seeds",
"type": "stockholm"
}
],
"position": {
"left": 230,
"top": 10
},
"post_job_actions": {},
"tool_id": "toolshed.g2.bx.psu.edu/repos/csbl/repeatmodeler/repeatmodeler/2.0.4+galaxy1",
"tool_shed_repository": {
"changeset_revision": "8661b2607b7e",
"name": "repeatmodeler",
"owner": "csbl",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/shared/ifbstor1/galaxy/mutable-config/tool-data/shared/ucsc/chrom/?.len\", \"input_file\": null, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "2.0.4+galaxy1",
"type": "tool",
"uuid": "9312ba36-4275-4d40-8ba6-95eea1b23b11",
"when": null,
"workflow_outputs": [
{
"output_name": "sequences",
"label": "sequences"
},
{
"output_name": "seeds",
"label": "seeds"
}
]
},
"2": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.5+galaxy0",
"errors": null,
"id": 2,
"input_connections": {
"input_fasta": {
"id": 1,
"output_name": "sequences"
}
},
"inputs": [],
"label": null,
"name": "RepeatMasker",
"outputs": [
{
"name": "output_masked_genome",
"type": "fasta"
},
{
"name": "output_log",
"type": "tabular"
},
{
"name": "output_table",
"type": "txt"
},
{
"name": "output_repeat_catalog",
"type": "txt"
},
{
"name": "output_gff",
"type": "gff"
}
],
"position": {
"left": 450,
"top": 10
},
"post_job_actions": {},
"tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.5+galaxy0",
"tool_shed_repository": {
"changeset_revision": "ba6d2c32f797",
"name": "repeat_masker",
"owner": "bgruening",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"__input_ext\": \"input\", \"advanced\": {\"is_only\": false, \"is_clip\": false, \"no_is\": false, \"rodspec\": false, \"primspec\": false, \"nolow\": false, \"noint\": false, \"norna\": false, \"alu\": false, \"div\": false, \"search_speed\": \"\", \"frag\": \"40000\", \"gc\": null, \"gccalc\": false, \"nocut\": false, \"xout\": false, \"keep_alignments\": false, \"invert_alignments\": false, \"poly\": false}, \"chromInfo\": \"/shared/ifbstor1/galaxy/mutable-config/tool-data/shared/ucsc/chrom/?.len\", \"excln\": true, \"gff\": true, \"input_fasta\": null, \"repeat_source\": {\"source_type\": \"dfam\", \"__current_case__\": 0, \"species_source\": {\"species_from_list\": \"no\", \"__current_case__\": 1, \"species_name\": \"\"}}, \"xsmall\": true, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "4.1.5+galaxy0",
"type": "tool",
"uuid": "e6c8e6a1-efe8-4291-b12b-5fdb3795b6ca",
"when": null,
"workflow_outputs": [
{
"output_name": "output_masked_genome",
"label": "output_masked_genome"
},
{
"output_name": "output_log",
"label": "output_log"
},
{
"output_name": "output_table",
"label": "output_table"
},
{
"output_name": "output_repeat_catalog",
"label": "output_repeat_catalog"
},
{
"output_name": "output_gff",
"label": "output_gff"
}
]
rlibouba marked this conversation as resolved.
Show resolved Hide resolved
}
},
"tags": [],
"uuid": "f25be8fa-7823-456f-9707-a497703f48d7",
"version": 0
}
Loading