Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: updated version of pretext workflow developed by Delphine that takes HiFi and HiC as input. #584

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
35d936d
feat: updated version of pretext workflow developed by Delphine that …
Smeds Oct 28, 2024
5ce525d
Fix connections between optional parameters and mandatory inputs,...
Delphine-L Nov 5, 2024
4e20b69
add license, authors, and diverse labels and tags
Delphine-L Nov 12, 2024
b666689
change telomere track height to correspond to the size of the telomeres
Delphine-L Nov 12, 2024
42a7240
Add README and CHANGELOG
Delphine-L Nov 12, 2024
e6c4591
update pretextgraph to fix track display issue, and add parameters to…
Delphine-L Nov 14, 2024
6cc5b23
add new parameter to the README
Delphine-L Nov 14, 2024
851f762
Remove explicit data conversion, it got fixed in galaxy
Delphine-L Nov 20, 2024
0cf305a
Rename folder
Delphine-L Nov 20, 2024
2a0c7bb
add release number and remove parameter from the READMA
Delphine-L Nov 20, 2024
4de0fbc
expose output to be used for Jbrowse2 workflow
Delphine-L Nov 20, 2024
3cf74ab
add tests
Delphine-L Nov 26, 2024
95bd01c
replace dockstore file after cleaning my local repository
Delphine-L Nov 26, 2024
2779664
add Marius comments
Delphine-L Nov 26, 2024
d4ba83e
rename folder
Delphine-L Nov 26, 2024
ce121eb
Make more clear that HiC reads need to be in collections
Delphine-L Nov 26, 2024
a8fc19f
use smaller test data
Delphine-L Nov 26, 2024
c4611ad
Remove unnecessary specification that the input is a collection
Delphine-L Nov 27, 2024
1ff2cd8
add marius comments and more details about the inputs
Delphine-L Nov 27, 2024
306265e
correct typo
Delphine-L Nov 27, 2024
4707632
use even smaller data
Delphine-L Nov 27, 2024
14c03d2
rename output of Gfastats
Delphine-L Nov 27, 2024
425d64c
update tools
Delphine-L Dec 13, 2024
db8b8ad
renaming folder with more descriptive name
Delphine-L Dec 16, 2024
67f1492
change folder name to lower case
Delphine-L Dec 16, 2024
0911819
Improve name of the Workflow
Delphine-L Dec 16, 2024
3264afa
update dockstore file
Delphine-L Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /hi-c-map-for-assembly-manual-curation.ga
testParameterFiles:
- /hi-c-map-for-assembly-manual-curation-tests.yml
authors:
- name: Patrik Smeds
orcid: 0000-0001-6228-2785
- name: Delphine Lariviere
orcid: 0000-0001-6421-3484
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Changelog

## [1.0] 2024-11-12

- Creation of a workflow for the generation of Hi-C Maps with coverage, gaps and Telomere Tracks
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Hi-C Contact map generation for manual curation of genome assemblies : 2 haplotypes

This workflow generates Hi-C contact maps for diploid genome assemblies in the Pretext format. It includes tracks for PacBio read coverage, Gaps, and telomeres. The Pretext files can be open in PretextView for the manual curation of genome assemblies.


## Inputs

1. **Haplotype 1** [fasta]
2. **Haplotype 2** [fasta]
3. **Do you want to add suffixes to the scaffold names?** Select yes if the scaffold names in your assembly do not contain haplotype information.
4. **Haplotype 1 suffix** This suffix will be added to haplotype 1 scaffold names if you selected to add suffixes to the scaffold names.
5. **Haplotype 2 suffix** This suffix will be added to haplotype 2 scaffold names if you selected to add suffixes to the scaffold names.
6. **Hi-C reads - forward** [fastq] Collection containing the Hi-C forward reads
7. **Hi-C reads - reverse** [fastq] Collection containing the Hi-C reverse reads
8. **Do you want to trim the Hi-C data?** If *yes*, remove 5bp at the end of Hi-C reads. Use with Arima Hi-C data if the Hi-C map looks "noisy".
9. **Telomere repeat to suit species** Expected value of the repeated sequences in the telomeres. Default value [CCCTAA] is suited to vertebrates.
10. **PacBio reads** [fastq] Collection of PacBio reads.


## Outputs

1. Concatenated Assembly [fasta]
2. Trimmed Hi-C data (If trimming option is selected) [fastq]
3. Mapped Hi-C reads [bam]
4. Telomeres track [bedgraph]
5. Gap track [bedgraph]
6. Coverage track [bigwig]
7. Pretext Map without tracks [pretext]
8. Pretext Map with tracks [pretext]
9. Pretext Snapshot image of the Hi-C contact map [png]
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
- doc: Test outline for hi-c-map-for-assembly-manual-curation.ga
job:
Haplotype 1:
class: File
location: https://zenodo.org/records/14230702/files/Haplotype%201.fasta
filetype: fasta
Haplotype 2:
class: File
location: https://zenodo.org/records/14230702/files/Haplotype%202.fasta
filetype: fasta
Hi-C reads - forward:
class: Collection
collection_type: list
elements:
- class: File
identifier: HiC forward reads
location: https://zenodo.org/records/14230702/files/HiC%20forward.fastqsanger.gz
Hi-C reads - reverse:
class: Collection
collection_type: list
elements:
- class: File
identifier: HI-C reverse reads
location: https://zenodo.org/records/14230702/files/HiC%20reverse.fastqsanger.gz
PacBio reads:
class: Collection
collection_type: list
elements:
- class: File
identifier: PacBio reads.fastq.gz
location: https://zenodo.org/records/14230702/files/PacBio%20reads.fastq.gz
Do you want to add suffixes to the scaffold names?: true
Haplotype 1 suffix: hap1
Haplotype 2 suffix: hap2
Do you want to trim the Hi-C data?: true
Telomere repeat to suit species: CCCTAA
outputs:
Merged Haplotypes:
asserts:
has_text:
text: ">scaffold_10.hap1"
has_text:
text: ">scaffold_10.hap2"
Gaps Bed:
asserts:
has_text:
text: "scaffold_10.hap1 34145604 34145804"
has_text:
text: "scaffold_10.hap2 137138839 137139039"
Seqtk-telo Output:
asserts:
has_text:
text: "scaffold_10.hap2 0 11012 139653677"
Gaps Bedgraph:
asserts:
has_text:
text: "scaffold_10.hap2 137138839 137139039 200"
BigWig Coverage:
asserts:
has_size:
value : 112000
delta: 4000
Telomeres Bedgraph:
asserts:
has_text:
text: "scaffold_10.hap2 0 11012 11012"
Trimmed Hi-C Forward Reads:
asserts:
has_size:
value : 13900000
delta: 2000000
Trimmed Hi-C Reverse Reads:
asserts:
has_size:
value : 14300000
delta: 2000000
Merged Hi-C Alignments:
asserts:
has_size:
value : 4600000
delta: 1000000
Pretext All tracks:
asserts:
has_size:
value : 946900
delta: 40000

Loading
Loading