Skip to content
christianparobek edited this page Nov 20, 2014 · 4 revisions

We need to determine depth over SNPs that occur in coding regions vs depth over SNPs that occur in non-coding regions. Why? Manske et al. found that in their 825 P. falciparum samples they had much greater depth over their coding SNPs vs their non-coding SNPs (~4x difference). We suspect the difference will not be as pronounced for us. However, we need to look.

grep "exon\|##" PlasmoDB-10.0_PvivaxSal1.gff | head -n -1 > PlasmoDB-10.0_PvivaxSal1_exons.gff
    # Make an exon-only GFF file (includes header)
    # Remove the last line of the file, since it starts with '##' 
grep "PASS\|^#" combined.qual.vcf > combined.pass.vcf
    # combined.qual.vcf has ~250K SNP entries with filtering flags
    # We need to select only the 'PASS'ing entries (and the header)
bedtools intersect -a combined.pass.vcf -b PlasmoDB-10.0_PvivaxSal1_exons.gff > combined.exons.vcf
    # Get only VCF entries in combined.pass.vcf that are in exons
bedtools intersect -v -a combined.pass.vcf -b PlasmoDB-10.0_PvivaxSal1_exons.gff > combined.nonexons.vcf
    # Get only VCF entries in combined.pass.vcf that are outside exons
Clone this wiki locally