Skip to content

ATACorrect

Mette Bentsen edited this page Jan 26, 2023 · 5 revisions

Background

Similar to other enzymes used in chromatin accessibility assays (e.g. DNaseI for DNase-seq), the Tn5 transposase harbours an inherent insertion bias. This means that while we assume insertion frequency to be driven by accessibility alone, the local cutting pattern is largely driven by the underlying sequence. This interferes with footprinting analysis, and should therefore be corrected.

The TOBIAS ATACorrect tool corrects this bias to yield a corrected signal, which simultanously serves to identify regions "less cut than expected", indicating regions of Tn5 protection due to protein binding. An overview of the estimation is seen here:


Example command

$ TOBIAS ATACorrect --bam test_data/Bcell.bam --genome test_data/genome.fa.gz --peaks test_data/merged_peaks.bed --blacklist test_data/blacklist.bed --outdir ATACorrect_test --cores 8

~3 minutes


Input parameters

  • --bam
    A bam-file containing mapped ATAC-seq reads. Should have an associated .bam.bai index (else it will be created).
  • --genome
    FASTA-file containing the genome to which the sequencing reads were mapped. Fasta headers should be in the format ">chr1", ">chr2" etc. The file should have an associated .fa.fai index (else it will be created). The genomic ranges of bam and fasta must match (ATACorrect will raise an error if not).
  • --peaks
    A .bed-file containing peak regions, which are the regions of interest for doing subsequent footprinting. The peaks are used to calculate the read-in-peaks ratio for normalization.
  • --blacklist
    A .bed-file containing blacklisted regions. This option is optional, but highly recommended, as the estimation of the bias motif can otherwise be affected by these regions.

Full input parameters can be found by running TOBIAS ATACorrect --help.

Output

{outdir}/

  • {prefix}_uncorrected.bw
    The uncorrected cutsite signal representing observed reads in basepair resolution (every read is shifted by +4/-5 and counted as the start of the read). This track is normalized for sequencing depth but not corrected in terms of Tn5 bias.

  • {prefix}_bias.bw
    The raw bias score against the PWM/DWM bias matrix. This is purely based on sequence.

  • {prefix}_expected.bw
    This is the expected cutsite signal given the influence of bias. It is the raw bias score scaled towards the sum of cuts in the region, and can be directly compared to the uncorrected signal.

  • {prefix}_corrected.bw
    This is the corrected cutsite signal and will contain both positive and negative values for positions respectively more or less cut than expected.

  • {prefix}_atacorrect.pdf
    A .pdf-file showing the observed Tn5 bias before and after correction.

  • {prefix}_AtacBias.pickle (*new in 0.11.0) A .pickle file containing the learned bias motif as an AtacBias object. This file is exclusively for debugging or use in external applications. For users wanting to utilize this object (NOTE: only recommended for advanced python users!), please see atacorrect_functions.py and sequences.pyx to learn about the structure of AtacBias and SequenceMatrix objects. The object can be initalized using AtacBias().from_pickle(<pickle_file>).