Skip to content
christianparobek edited this page Oct 3, 2014 · 11 revisions

It took me a while to figure out how DepthOfCoverage works. The problem was with figuring out what filetypes ought to be supplied with the -geneList and -L files. With the help of the friendly folks of the GATK forums, it turns out that -geneList should be in the RefSeq format, while -L should be in GATK's interval format (chr:start-stop). The tricky part is that the line-item entries in the -L file should fall inside the entries in the -geneList file. This is in contrast to -L's usual behavior, when I use it to specify entire chromosomes. It also appears that we can use the same start and stop values for the transcript, cds, and exon columns within the RefSeq file, and for the start-stop values of the -L ".intervals" file. If we do this, we will lose coverage information wrt 5' and 3' UTRs, as well as introns.

For instruction on how I made the genes and exons interval files, see this link.

Here is the code I'm currently using (September 2014). Notice that, similar to other GATK walkers, we can specify a list of BAM's as the input file(s):

java -jar GenomeAnalysisTK.jar \
    -T DepthOfCoverage \
    -R PlasmoDB-10.0_PvivaxSal1_Genome.fasta \
    -I bamnames.list \
    -o coverage/allGenes.cov \
    -geneList coverage/PvSal1_v10.0_genes.refseq \
    -L coverage/PvSal1_v10.0_genes.intervals \
    -omitBaseOutput \
    --minMappingQuality 20 \
    --minBaseQuality 20

Analysis Approach:

  1. Run DepthOfCoverage on all 60 Oddar Meanchey samples to identify which samples should be excluded. Exclusion criteria was median coverage over all coding regions =< 10. Do this in R.
  2. Remove those samples and reanalyze on a per-gene basis. Exclude any individual genes with less than 60% gene area covered to at least 15x depth in at least 40 samples. Remove these genes from future analysis. Of the genes we're analyzing, ~30 genes are actually overlapping another gene, and these weren't analyzed by DepthOfCoverage. Sent a question to the forum about that. Also, need to remove paralogous genes, since I haven't done that yet.
Clone this wiki locally