-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace_reads.py only writing out one read of each pair for paired end. #212
Comments
Apologies, there is a regression in 1.4 that causes read tracking information to be lost (ed35087#r81884432), will update when fixed. |
@BrianOSullivanGit, apologies again. I introduced this issue. We'll work on it and release a fix ASAP. |
No worries, fair enough. I'll get 1.3 going again and wait for the update before going back to 1.4. Thanks for letting me know. |
Hi, ie., After patch applied, ie.,
Pre-spike-in read pairs, $ samtools view tc.bam -r "chr1:964961-964961"| sort -k1,1 |cut -f1-4|more Post spike-in reads, $ samtools view tc.spike.sort.bam -r "chr1:964961-964961"| sort -k1,1 |cut -f1-4|more T_X2_chr1-105658166 83 chr1 964925 And there is nothing spiked in into the output etc. Post spike-in Let me know if you need a simple test case to debug the issue. I think it should show up in all paired-end data however. Thanks, Brian |
@BrianOSullivanGit, could you try the new updated version of the fix? |
Thanks for updating it. The last version of your fix appears to work. I am just running a larger test on a full bam now. Will let you know how it goes. |
Didn't mean to close, will leave open pending @BrianOSullivanGit to come back. |
Thanks guys, it seems to be working fine again now. Appreciate your help. Closing. Best regards, Brian |
I'm having an issue with release 1.4 where only one read of each pair seems to be written out by replace_reads.
In the test case below I have a simple setup with a subsetted bam etc.,
[bosullivan@lugh TEST_SMALL_TC3]$ ls -ltr
total 16
-rw-rw-r-- 1 bosullivan bosullivan 23 Aug 22 20:01 test.spike
-rw-rw-r-- 1 bosullivan bosullivan 5412 Aug 22 20:01 tc.bam
-rw-rw-r-- 1 bosullivan bosullivan 736 Aug 22 20:01 tc.bam.bai
[bosullivan@lugh TEST_SMALL_TC3]$ cat test.spike
chr1 964961 964961 0.5
I run bamsurgeon with,
python3 ../../bamsurgeon-1.4/bin/addsnv.py --procs 16 --aligner mem --picardjar /home/bosullivan/bin/picard-tools-1.131/picard.jar -v test.spike -f tc.bam -r /data/bosullivan/XXXXXXXXX/X2_HG00110.coding_exons_hg38.fa -o tc.spike.bam 2>&1 | tee run.log
It runs without error, noting,
[bosullivan@lugh TEST_SMALL_TC3]$ tail -2 tc.spike.addsnv.test.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SPIKEIN
chr1 964961 . T G 100 PASS SOMATIC;VAF=0.4827586206896552;DPR=28.0 GT 0/1
However, if you look at the spike in bam, only one of the reads from each pair in the original pile-up are there, ie.,
samtools sort tc.spike.bam > tc.spike.sort.bam; samtools index tc.spike.sort.bam
[bosullivan@lugh TEST_SMALL_TC3]$ samtools view tc.spike.sort.bam -r "chr1:964961-964961"| cut -f1 | sort | uniq -c
1 T_X2_chr1-105658166
1 T_X2_chr1-114050428
1 T_X2_chr1-118104706
1 T_X2_chr1-119021410
1 T_X2_chr1-131492998
1 T_X2_chr1-134105122
1 T_X2_chr1-134498332
1 T_X2_chr1-137335478
1 T_X2_chr1-17341428
1 T_X2_chr1-25397966
1 T_X2_chr1-27489216
1 T_X2_chr1-33319224
1 T_X2_chr1-33609412
1 T_X2_chr1-35208886
1 T_X2_chr1-37384776
1 T_X2_chr1-4360962
1 T_X2_chr1-44825172
1 T_X2_chr1-45622772
1 T_X2_chr1-46771388
1 T_X2_chr1-47331784
1 T_X2_chr1-51102978
1 T_X2_chr1-63417112
1 T_X2_chr1-68746918
1 T_X2_chr1-70114492
1 T_X2_chr1-72960300
1 T_X2_chr1-80234100
1 T_X2_chr1-8309364
1 T_X2_chr1-88295864
1 T_X2_chr1-98425500
Whereas the original, pre-spike-in bam is,
[bosullivan@lugh TEST_SMALL_TC3]$ samtools view tc.bam -r "chr1:964961-964961"| cut -f1 | sort | uniq -c
2 T_X2_chr1-105658166
2 T_X2_chr1-114050428
2 T_X2_chr1-118104706
2 T_X2_chr1-119021410
2 T_X2_chr1-131492998
2 T_X2_chr1-134105122
2 T_X2_chr1-134498332
2 T_X2_chr1-137335478
2 T_X2_chr1-17341428
2 T_X2_chr1-25397966
2 T_X2_chr1-27489216
2 T_X2_chr1-33319224
2 T_X2_chr1-33609412
2 T_X2_chr1-35208886
2 T_X2_chr1-37384776
2 T_X2_chr1-4360962
2 T_X2_chr1-44825172
2 T_X2_chr1-45622772
2 T_X2_chr1-46771388
2 T_X2_chr1-47331784
2 T_X2_chr1-51102978
2 T_X2_chr1-63417112
2 T_X2_chr1-68746918
2 T_X2_chr1-70114492
2 T_X2_chr1-72960300
2 T_X2_chr1-80234100
2 T_X2_chr1-8309364
2 T_X2_chr1-88295864
2 T_X2_chr1-98425500
The pileup is not as expected given the addsnv.test.vcf either,
[bosullivan@lugh TEST_SMALL_TC3]$ samtools mpileup -f /data/bosullivan/XXXXXXXXX/X2_HG00110.coding_exons_hg38.fa tc.spike.sort.bam -r "chr1:964961-964961"
[mpileup] 1 samples in 1 input files
chr1 964961 T 2 g^], G@
While the pre-spikein bam was,
[bosullivan@lugh TEST_SMALL_TC3]$ samtools mpileup -f /data/bosullivan/XXXXXXXXX/X2_HG00110.coding_exons_hg38.fa tc.bam -r "chr1:964961-964961"
[mpileup] 1 samples in 1 input files
chr1 964961 T 29 ............,..............^
,^, JBFDFJJJIJIDJCJE3IHJJIHDFFFD@Just wondering if you would have any ideas whats happening? I ran addsnv.py in this way previously for release 1.3 without seeing this issue. Unfortunately I do not have access to a python2.7 system to double check it at the moment.
Any help very much appreciated as I am at a complete loss. This is a simple test case however I am having issues with all snvs I try to spike in in this way. I've attached the output of the addsnv.py to this fyi.
out.log
Thanks & regards, Brian
The text was updated successfully, but these errors were encountered: