DESeq for Differential Chromatin Accessibility

This pipeline can be used to conduct a preliminary analysis of differences in chromatin accessibility between two user-defined groupings. The pipeline begins with a list of aggregation ID's and user-specified groupings to compare. The script takes care of all pre-processing steps, runs a DESeq2 comparison and then outputs bed files of all differences, significant differences, and visualizations.

More information on DESeq2

Run-time assumptions

User has LIMS API token setup and stored in ~/.bash_profile as LIMS_API_TOKEN
Aggregation level cutcounts and called peak files are available in LIMS directory

Installation/Requirements

module purge
module load deseq_pipeline

Note: If you are having issues loading modules, check $LD_LIBRARY_PATH

To run:

1. Create a text file of aggregate id's to analyze and their grouping metadata

agg_id   condition   
AG5536     WT          
AG5537     DNMT3+

If your sample does not have a LIMS-generated aggregation ID, you can also just provide file paths to the relevant cutcounts and peaks files. You can also provide a combination of aggregate ID's and file paths:

agg_id    condition    cutcounts                                          peaks
sample1    WT          /home/cutcount_dir/sample1_cutcounts.bed.starch    /home/peaks_dir/sample1_peaks.bed.starch
AG5536     HET 
sample2    HET         /home/cutcount_dir/sample2_cutcounts.bed.starch    /home/peaks_dir/sample2_peaks.bed.starch

2. Create a config file with following categories/variables:

[paths]  

# path to aggregation id's, one per line  
agg_list=./metadata.txt  

[transforms]  

# choose custom for John Lazar's normalization script  
# choose none to use DESeq filtering  
norm_method=custom  

# if using John Lazar's normalization method,
# select true to use geomean, false for loess
geomean=true  

[deseq_options]

# experimental design for creation of DESeq exp object
experimental_design="condition + time"  

# supply required contrast argument to DESeq
contrasts="condition,WT,DNMT3+"

3. Run as:

deseq.py -c config.txt

Required Input

metadata.txt: tsv format, one aggregate ID per line, w/header, ex: agg_id | condition | any_other_grouping
- Example metadata.txt
config.txt: Config file containing the following options.
- Simple example: The minimum information required using default options.
- Complex example: Example with all configurable options.

Options

Pipeline options:

Input Master (default= None): Do you already have a master DHS reference, or should it be created from your input samples?
- "none" = create from samples
- path/to/master_dhs.bed = use this file for a reference
Threshold (default = 0): Remove any rows from master DHS reference that have loci in less than X samples
Input Options: choose between using peaks or hotspots to count overlaps between your master list
Normalization Method (default = Custom): DESeq2 normalization is tailored for RNAseq data and does not fit well to differential DHS loci data because many regions will have no DHS sites, thus making sites with DHS seem over-important. To deal with this issue, John Lazar has created a normalization method that uses the Geometric Mean or Loess smoothing methods to provide a much better fit for DHS data. This method is recommended.
- "custom" = Use John Lazar's custom normalization method
- "none" = Use DESeq normalization
Geomean (default = True): If using custom normalization method:
- "true" = use Geometric Mean
- "false" = use Loess smoothing

DESeq2 Options

Experimental design: Model used in DESeq2 Design indicates the effects we want to measure. For example the following would examine the effect of condition and time on the data.
- ex: "condition + time" The following would examine the effect of condition on the data, controlling for batch effect:
- ex: "~batch + condition" The experimental design should only contain column names and values found in your agg_id.txt file
- More information
Contrasts: Allow you to examine the differences in the experimental design between to groupings. These should be specified as a comma separated list containing (in order): "column_name,group1,group2".
- ex: "condition,treatment,time" -Save R objects: Save intermediate R objects for reference, and to assist in running contrasts with the same experiemental design more quickly. TRUE to save, FALSE to skip.
FDR: Set your FDR threshold. By default this is set to 0.01.

Example

Click here to view an example walk-thru.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
deseq		deseq
.gitattributes		.gitattributes
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
_gitignore		_gitignore
deseq2_modules.sh		deseq2_modules.sh
dhs_workflow_diagram.png		dhs_workflow_diagram.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DESeq for Differential Chromatin Accessibility

Run-time assumptions

Installation/Requirements

To run:

1. Create a text file of aggregate id's to analyze and their grouping metadata

2. Create a config file with following categories/variables:

3. Run as:

Required Input

Options

Pipeline options:

DESeq2 Options

Example

About

Releases

Packages

Languages

summerela/deseq2_pipeline

Folders and files

Latest commit

History

Repository files navigation

DESeq for Differential Chromatin Accessibility

Run-time assumptions

Installation/Requirements

To run:

1. Create a text file of aggregate id's to analyze and their grouping metadata

2. Create a config file with following categories/variables:

3. Run as:

Required Input

Options

Pipeline options:

DESeq2 Options

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages