You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have RNA-seq data from an F1 cross between C57BL/6J and CAST/EiJ mice (BxC), and I'm trying to look for allele-specific expression off specific parental alleles. To do this, I aligned a set of paired-end read files with the following command (edited to remove identifying information):
STAR --runThreadN 4 --genomeDir /genome/path/mm39/ --readFilesIn /path/to/fastq/files/E10.5_BxC_embryo_7_r1_val_1.fq /path/to/fastq/files/E10.5_BxC_embryo_7_r2_val_2.fq --varVCFfile /genome/path/mm39/annotations/CAST_EiJ_mm39LiftOver_snps.vcf --waspOutputMode SAMtag --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI AS nM vA vG vW --outFileNamePrefix /path/to/aligned/output/directory/E10.5_BxC_embryo_7_ --outSAMattrRGline ID:job1 PU:uk SM:E10.5_BxC_embryo_7 PL:illumina LB:uk
The VCF file is from the Mouse Genomes Project, for the record (albeit lifted over from an old version rather than properly filtered out of the latest version). My expectation based on the STAR Manual was that STAR would find SNPs that are expected to be heterozygous in my dataset (i.e. the VCF line has distinct REF and ALT alleles and the genotype in the strain column is listed as 1/1 or otherwise homozygous), then (when mapping a read spanning a variant) would make a decision about which allele was in the read, store that information in the vA and vG tags, then apply WASP to determine if the read was likely to be reference-biased. Thus, the number of reads with a vA tag and the number of reads with a vW tag should be equivalent. However, when I executed the following commands on the aligned output BAM:
I got line counts of 7,819,242 and 8,501,502, respectively. So, obviously some amount of my understanding of STAR is flawed. My questions are:
What defines when STAR will assign vA/vG tags to a read?
What defines when STAR will put a read through WASP filtering?
Likely answered by the above but worth confirming: will STAR allow me to determine expression off specific parental alleles using the VCF file as-is, or do I need to generate an alternative VCF file that is representative of my F1 cross genome (i.e. the genotype for VCF lines of interest are edited to be 0/1)?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have RNA-seq data from an F1 cross between C57BL/6J and CAST/EiJ mice (BxC), and I'm trying to look for allele-specific expression off specific parental alleles. To do this, I aligned a set of paired-end read files with the following command (edited to remove identifying information):
STAR --runThreadN 4 --genomeDir /genome/path/mm39/ --readFilesIn /path/to/fastq/files/E10.5_BxC_embryo_7_r1_val_1.fq /path/to/fastq/files/E10.5_BxC_embryo_7_r2_val_2.fq --varVCFfile /genome/path/mm39/annotations/CAST_EiJ_mm39LiftOver_snps.vcf --waspOutputMode SAMtag --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI AS nM vA vG vW --outFileNamePrefix /path/to/aligned/output/directory/E10.5_BxC_embryo_7_ --outSAMattrRGline ID:job1 PU:uk SM:E10.5_BxC_embryo_7 PL:illumina LB:uk
The VCF file is from the Mouse Genomes Project, for the record (albeit lifted over from an old version rather than properly filtered out of the latest version). My expectation based on the STAR Manual was that STAR would find SNPs that are expected to be heterozygous in my dataset (i.e. the VCF line has distinct
REF
andALT
alleles and the genotype in the strain column is listed as1/1
or otherwise homozygous), then (when mapping a read spanning a variant) would make a decision about which allele was in the read, store that information in the vA and vG tags, then apply WASP to determine if the read was likely to be reference-biased. Thus, the number of reads with a vA tag and the number of reads with a vW tag should be equivalent. However, when I executed the following commands on the aligned output BAM:I got line counts of 7,819,242 and 8,501,502, respectively. So, obviously some amount of my understanding of STAR is flawed. My questions are:
0/1
)?Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions