-
Notifications
You must be signed in to change notification settings - Fork 17
Detailed Use: Replication Timing log2 Coverage Ratio from Two BAM Files
Replication timing (or Timex data) sequencing can be used to identify early and late replicating genome regions. This is accomplished by comparing the sequencing depth of G1-phase synchronized cells [2N genome state] to S-phase (or asynchronous) cells [2N-4N genome state]. Early replicating regions will have more copies, and generate higher sequencing depths in the S-phase cells, resulting in "copy-number change"-like events. With BAMscale, the log2 coverage ration of two BAM files can be done in one step. To get better results, it is suggested to use smoothening as well.
In our analyses, we generally use three main parameters: 1) bin size set to 100bp, 2) set the operation to "log2" and 3) smoothening set to 500 bins. This tells BAMscale, to calculate the average of -500 to +500 bins for each bin (of length 100bp), meaning that one bin is the average of 10k bases (500bins * 100binsize * 2).
In this tutorial, we will process Replication timing data from data from:
"Dual Roles of Poly(dA:dT) Tracts in Replication Initiation and Fork Collapse"
The article can be found here, and this is the GEO profile with attached data.
The complete analysis steps can be found below
G1-phase cells Repli_seq_rB_0h_NT_rep1
S-phase cells Repli_seq_aB_48h_NT_rep1
If we have the two BAM files, we can simply generate the log2 coverage tracks by running:
Important: The second BAM file will be divided by the first BAM!!!
./BAMscale scale --bam Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam \
--bam Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam \
-t 8 \
--operation log2 \
--binsize 100 \
--smoothen 500
The command will output three BigWig files:
1) Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.scaled.bw
2) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam.scaled.bw
3) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam_vs_Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.log2.bw
As a comparison, we will add the END-seq and OK-seq data as well.
The upper track is the replication timing (log2) coverage track. The middle two tracks are from END-seq data, while the (green/red) dotplot is the OK-seq data (meaning where there is a Watson/Crick strand switch in OK-seq,there is END-seq signal as well).
Chromosome 1 (full)
The upper track is the replication timing (log2) coverage track. The while the (green/red) dotplot is the OK-seq data, while the bottom two tacks are from END-seq. As you can see in the figure, this region is early replicating, plus the END-seq and OK-seq coverages have similar signal positions.
Chromosome 1 (region from publication)
The first step is to download the samples using the sratoolkit.
fasterq-dump SRR7440351 --progress --split-3
fasterq-dump SRR7440352 --progress --split-3
Raw reads are aligned with the BWA mem aligner using default alignment settings (but multithreaded) to the hg19 human genome. Reads aligned with BWA were directly converted from SAM format to BAM format using samtools.
Sample Repli_seq_rB_0h_NT_rep1 (SRR7440351)
bwa mem -t 8 <mm10.idx> SRR7440351_1.fastq SRR7440351_2.fastq \
| samtools view -Sbh > Repli_seq_rB_0h_NT_rep1.unsorted.bam
Sample Repli_seq_aB_48h_NT_rep1 (SRR7440352)
bwa mem -t 8 <mm10.idx> SRR7440352_1.fastq SRR7440352_2.fastq \
| samtools view -Sbh > Repli_seq_aB_48h_NT_rep1.unsorted.bam
Next, we sort reads, mark duplicates, and index the BAM files.
Sample Repli_seq_rB_0h_NT_rep1 (SRR7440351)
samtools sort -o Repli_seq_rB_0h_NT_rep1.sorted.bam Repli_seq_rB_0h_NT_rep1.unsorted.bam
samtools index Repli_seq_rB_0h_NT_rep1.sorted.bam
java -Xmx8g -jar picardtools.jar MarkDuplicates \
I=Repli_seq_rB_0h_NT_rep1.sorted.bam \
O=Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam \
M=Repli_seq_rB_0h_NT_rep1.sorted.dedup.metrics
samtools index Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam
Sample Repli_seq_aB_48h_NT_rep1 (SRR7440352)
samtools sort -o Repli_seq_aB_48h_NT_rep1.sorted.bam Repli_seq_aB_48h_NT_rep1.unsorted.bam
samtools index Repli_seq_aB_48h_NT_rep1.sorted.bam
java -Xmx8g -jar picardtools.jar MarkDuplicates \
I=Repli_seq_aB_48h_NT_rep1.sorted.bam \
O=Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam \
M=Repli_seq_aB_48h_NT_rep1.sorted.dedup.metrics
samtools Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam
Finally, we generate the log2 track and scaled coverage tracks
./BAMscale scale --bam Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam \
--bam Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam \
-t 8 \
--operation log2 \
--binsize 100 \
--smoothen 500
The output of the two commands will be two BigWig files. BAMscale "scale" function appends the ".bw" suffix to the end of the input file. For these two examples, the output names will be:
1) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam.bw
2) Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.bw
3) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam_vs_Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.log2.bw
- Main page
- Home
- Installation
- Benchmarking
- Brief examples
- Detailed Manuals
- Visualization scripts