Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChromHMM enhancer extension #68

Merged
merged 22 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
eb4d9ea
Switch to RData based approach in ehmm
nictru Nov 30, 2023
8820918
Added initial chromHMM workflow
LeonHafner Nov 30, 2023
60a6bc9
Added ChromHMM GET_OUTPUT
LeonHafner Dec 7, 2023
86cd8c9
Added openjdk docker container
LeonHafner Dec 7, 2023
0584335
Added process labels
LeonHafner Dec 8, 2023
5a05bc0
Removed obsolete ChromHMM files and added chromsizes to igenomes config
LeonHafner Dec 13, 2023
95a1d91
Removed regular (non igenomes) chromsizes files
LeonHafner Dec 13, 2023
39e3df9
Moved ChromHMM executable to bin
LeonHafner Dec 13, 2023
2d8f9a5
Added metadata propagation to chromhmm
LeonHafner Dec 13, 2023
6d15e8b
Modified ignore files
LeonHafner Dec 26, 2023
bcd52b6
Integrated ChromHMM into pipeline
LeonHafner Dec 26, 2023
8c6c89e
Added ChromHMM docker container
LeonHafner Dec 26, 2023
ef90358
Added ROSE workflow
LeonHafner Jan 6, 2024
900bb75
Added more ucsc files to igenomes
LeonHafner Jan 6, 2024
7b7f492
Added stub to RUN_ROSE
LeonHafner Jan 9, 2024
6aa60a6
Integrate output of Rose into workflow and disable eHMM
LeonHafner Jan 9, 2024
7ba5e99
Changed shebang according to: https://stackoverflow.com/a/19305076
LeonHafner Jan 15, 2024
86cc1b9
Replaced ROSE_OUTPUT_TO_BED with nf-core gawk
LeonHafner Jan 17, 2024
10e5894
Removed obsolete ROSE_OUTPUT_TO_BED file
LeonHafner Jan 17, 2024
a13bb24
Replaced BED_TO_GFF with nf-core gawk
LeonHafner Jan 18, 2024
b2da525
Changed ROSE_OUTPUT_TO_BED output file name
LeonHafner Jan 18, 2024
d94cb47
Replaced REFORMAT_GFF with nf-core gawk
LeonHafner Jan 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ results/
testing/
testing*
*.pyc
*.swp
runner/
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ testing/
testing*
*.pyc
bin/
runner/
Binary file added bin/ChromHMM.jar
Binary file not shown.
2 changes: 1 addition & 1 deletion bin/combine_tables.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3

import argparse
import numpy as np
Expand Down
47 changes: 47 additions & 0 deletions bin/get_chromhmm_results.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/usr/bin/env python3
# coding: utf-8

import argparse
import pandas as pd
import numpy as np


parser = argparse.ArgumentParser(description="Process ChromHMM output into bed file of predicted enhancers")

parser.add_argument("-e", "--emissions", type=str, required=True, help="Path to emission file")
parser.add_argument("-b", "--bed", type=str, required=True, help="Path to bed file")
parser.add_argument("-t", "--threshold", type=float, required=False, default=0.9, help="Threshold for state emissions")
parser.add_argument("-m", "--markers", nargs='+', required=False, default=["H3K27ac", "H3K4me3"], help="ChIP-Seq markers that indicate an enhancer")
parser.add_argument("-o", "--output", type=str, required=True, help="Path to output bed with enhancer positions")

args = parser.parse_args()

path_emissions = args.emissions
path_bed = args.bed
threshold = args.threshold
markers = args.markers
output = args.output


# Read emissions file for the provided markers
emissions = pd.read_csv(path_emissions, sep = "\t")[["State (Emission order)"] + markers].rename(columns={"State (Emission order)": "State"})


# Read input bed file and remove unecessary columns
bed = pd.read_csv(path_bed,
sep="\t",
skiprows=1,
names=["chr", "start", "end", "state", "score", "strand", "start_1", "end_1", "rgb"]
).drop(columns=["strand", "score", "start_1", "end_1", "rgb"])


# Keep state if any of the markers is enriched > threshold for this state
states = emissions[np.any([emissions[marker] >= threshold for marker in markers], axis=0)]["State"].tolist()


# Subset bed file for selected states
out_bed = bed[np.isin(bed["state"], states)].drop(columns=["state"])

# Write output
out_bed.to_csv(output, index=False, sep="\t", header=False)

23 changes: 23 additions & 0 deletions bin/make_cellmarkfiletable.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env python3
# coding: utf-8

import os
import argparse
import pandas as pd


# Creates a cellmarkfiletable which is needed as input for ChromHMM
parser = argparse.ArgumentParser(description = "Script to remove full paths of input file to fit into nextflow workflow")
parser.add_argument("--input", help = "Input directory", required = True, type = str)
parser.add_argument("--output", help = "path for output file", required = True, type = str)

args = parser.parse_args()

input = args.input
output = args.output

table = pd.read_csv(input, sep = "\t", names=["state", "assay", "bam", "control"])

table["bam"] = [os.path.basename(path) for path in table["bam"]]
table["control"] = [os.path.basename(path) for path in table["control"]]
table.to_csv(output, header=False, sep="\t", index=False)
21 changes: 21 additions & 0 deletions bin/reformat_bam.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

# Reheaders a bam file and adds 'chr' to each chromosome
# $1 is the input bam
# $2 is the output bam

samtools view -H $1 | \
sed -e 's/SN:1/SN:chr1/' | sed -e 's/SN:2/SN:chr2/' | \
sed -e 's/SN:3/SN:chr3/' | sed -e 's/SN:4/SN:chr4/' | \
sed -e 's/SN:5/SN:chr5/' | sed -e 's/SN:6/SN:chr6/' | \
sed -e 's/SN:7/SN:chr7/' | sed -e 's/SN:8/SN:chr8/' | \
sed -e 's/SN:9/SN:chr9/' | sed -e 's/SN:10/SN:chr10/' | \
sed -e 's/SN:11/SN:chr11/' | sed -e 's/SN:12/SN:chr12/' | \
sed -e 's/SN:13/SN:chr13/' | sed -e 's/SN:14/SN:chr14/' | \
sed -e 's/SN:15/SN:chr15/' | sed -e 's/SN:16/SN:chr16/' | \
sed -e 's/SN:17/SN:chr17/' | sed -e 's/SN:18/SN:chr18/' | \
sed -e 's/SN:19/SN:chr19/' | sed -e 's/SN:20/SN:chr20/' | \
sed -e 's/SN:21/SN:chr21/' | sed -e 's/SN:22/SN:chr22/' | \
sed -e 's/SN:X/SN:chrX/' | sed -e 's/SN:Y/SN:chrY/' | \
sed -e 's/SN:MT/SN:chrM/' | samtools reheader - $1 > $2

Loading