-
Notifications
You must be signed in to change notification settings - Fork 17
Benchmarking
For benchmarking and validation purposes, we processed published ATAC-seq data, quantified peak coverages, and generated scaled coverage tracks. To provide comparisons, we used BEDtools for validation of performance and raw read counts, and developed coverage-track generation benchmarks with IGVTools and Deeptools (bamCoverage).
Data from ATAC-seq was used:
"SLFN11 Blocks Stressed Replication Forks Independently of ATR. Mol Cell, 2018"
Sample processing was done as explained here.
In the six processed ATAC-seq samples, a total of 32,819 peaks were identified (transposase-accessible chromatin regions).
Since BAMscale has multi-thread capabilities, we quantified peaks using 1, 4, and 8 threads. BEDtools' multicov is a single threaded program, and was run once. BAMscale is >2-fold quicker than BEDtools using 4 threads, with similar performance using one thread. It is important to note, that BEDtools generates raw read counts only, while BAMscale outputs three different normalized tables as well (FPKM, TPM and library size). We suggest using 4 threads, since higher thread counts have minimal impact on performance.
Correlation of raw read counts between BEDTools and BAMscale was above 0.999 in all cases.
Next we compared performance of coverage track generation to IGVTools and Deeptools bamCoverage programs. IGVTools can only generate unscaled tracks, with no read filter functionality. Deeptools "bamCoverage" on the other hand is capable of filtering reads and output scaled coverage tracks.
BAMscale using a single thread outperforms IGVtools, and outperforms deeptools (using 4 threads) by ~7-fold using a single thread. We see some performance increase by adding additional threads to BAMscale, but >4 threads is not suggested.
Results from the three methods output related results, with some minor differences in track height, due to differences in binning technique and scaling.
Raw sequencing reads were downloaded using the fasterq-dump program sratoolkit, and aligned using the BWA mem aligner using default alignment settings (but multithreaded) to the hg19 human genome. Reads aligned with BWA were directly converted from SAM format to BAM format using samtools.
bwa mem -t 8 <hg19.idx> <PAIR1.fq> <PAIR2.fq> \
| samtools view -Sbh \
> out.unsorted.bam
The aligned and unsorted reads were then sorted by samtools. Sorted reads were then indexed, duplicate reads were marked using picard-tools, and indexed one additional time.
samtools sort -o out.sorted.bam out.unsorted.bam
samtools index out.sorted.bam
java -Xmx8g -jar picardtools.jar MarkDuplicates \
I=out.sorted.bam \
O=out.sorted.dedup.bam \
M=out.sorted.dedup.metrics
samtools out.sorted.dedup.bam
Peaks were called using the MACS2 peak caller. Since the data was paired-end, the "BAMPE" flag was set.
macs2 callpeak -t out.sorted.dedup.bam --outdir outdir \
-f BAMPE -g hs --cutoff-analysis -q 0.01 \
Called peaks files were merged to create one BED file for all samples (using the unix cat command). Since many peak positions are common across samples, peaks were first sorted based on their position, then merged into a unique set of peaks using bedtools.
cat *.narrowPeak \
| bedtools sort \
| bedtools merge \
> merged_peaks.bed
Peaks were quantified using the BAMscale "cov" function. The output of this step are tables with: raw read counts, FPKM, TPM and library-size scaled peak-scores. For comparison, we calculated raw read counts using the BEDTools "multicov" program.
./BAMscale cov --bed merged_peaks.bed \
--bam SAMPLE1.sorted.dedup.bam \
--bam SAMPLE1.sorted.dedup.bam \
--bam SAMPLEn.sorted.dedup.bam
bedtools multicov -bed merged_peaks.bed \
-p \
-bams SAMPLE1.sorted.dedup.bam SAMPLE2.sorted.dedup.bam SAMPLE3.sorted.dedup.bam \
> Raw_coverages.bedtools.txt
Scaled coverage tracks for each BAM file were generated with the BAMscale "scale" function. The default settings were utilized, meaning the coverages in bins were scaled to the number of aligned bases and divided by the gene length. As comparison, we generated coverage tracks with IGVtools (using the hg19 chromSize file) and deeptools bamcoverage
./BAMscale scale --bam SAMPLE1.sorted.dedup.bam
igvtools count -z 5 SAMPLE1.sorted.dedup.bam SAMPLE1.sorted.dedup.bam.tdf hg19.chrom.sizes
bamCoverage --ignoreDuplicates --binSize 5 --effectiveGenomeSize 2864785220 \
-o SAMPLE1.sorted.dedup.bam.bw \
--normalizeUsing RPGC \
-b SAMPLE1.sorted.dedup.bam
- Main page
- Home
- Installation
- Benchmarking
- Brief examples
- Detailed Manuals
- Visualization scripts