Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CallMolecularConsensusReads truncate reads #759

Closed
ruolin opened this issue Jan 12, 2022 · 6 comments
Closed

CallMolecularConsensusReads truncate reads #759

ruolin opened this issue Jan 12, 2022 · 6 comments
Assignees
Labels

Comments

@ruolin
Copy link

ruolin commented Jan 12, 2022

Hi fgbio community,

I have a single read-pair family (cD=1) which after call consensus was truncated. This is indeed weird because it should just do nothing and output the same read. To reproduce this run java fgbio-1.5.0.jar CallMolecularConsensusReads -i a.bam -o a.ccs.bam -M 1. The following is the a.bam.

@HD     VN:1.6  SO:unsorted     GO:query        SS:unsorted:template-coordinate
@SQ     SN:1    LN:249250621    AS:GRCh37       M5:1b22b98cdeb4a9304cb5d48026a85128     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:2    LN:243199373    AS:GRCh37       M5:a0d9851da00400dec1098a9255ac712e     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:3    LN:198022430    AS:GRCh37       M5:fdfd811849cc2fadebc929bb925902e5     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:4    LN:191154276    AS:GRCh37       M5:23dccd106897542ad87d2765d28a19a1     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:5    LN:180915260    AS:GRCh37       M5:0740173db9ffd264d728f32784845cd7     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:6    LN:171115067    AS:GRCh37       M5:1d3a93a248d92a729ee764823acbbc6b     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:7    LN:159138663    AS:GRCh37       M5:618366e953d6aaad97dbe4777c29375e     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:8    LN:146364022    AS:GRCh37       M5:96f514a9929e410c6651697bded59aec     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:9    LN:141213431    AS:GRCh37       M5:3e273117f15e0a400f01055d9f393768     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:10   LN:135534747    AS:GRCh37       M5:988c28e000e84c26d552359af1ea2e1d     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:11   LN:135006516    AS:GRCh37       M5:98c59049a2df285c76ffb1c6db8f8b96     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:12   LN:133851895    AS:GRCh37       M5:51851ac0e1a115847ad36449b0015864     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:13   LN:115169878    AS:GRCh37       M5:283f8d7892baa81b510a015719ca7b0b     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:14   LN:107349540    AS:GRCh37       M5:98f3cae32b2a2e9524bc19813927542e     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:15   LN:102531392    AS:GRCh37       M5:e5645a794a8238215b2cd77acb95a078     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:16   LN:90354753     AS:GRCh37       M5:fc9b1a7b42b97a864f56b348b06095e6     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:17   LN:81195210     AS:GRCh37       M5:351f64d4f4f9ddd45b35336ad97aa6de     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:18   LN:78077248     AS:GRCh37       M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:19   LN:59128983     AS:GRCh37       M5:1aacd71f30db8e561810913e0b72636d     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:20   LN:63025520     AS:GRCh37       M5:0dec9660ec1efaaf33281c0d5ea2560f     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:21   LN:48129895     AS:GRCh37       M5:2979a6085bfe28e3ad6f552f361ed74d     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:22   LN:51304566     AS:GRCh37       M5:a718acaa6135fdca8357d5bfe94211dd     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:X    LN:155270560    AS:GRCh37       M5:7e0e2e580297b7764e31dbc80c2540dd     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:Y    LN:59373566     AS:GRCh37       M5:1fa3474750af0948bdf97d5a0ee52e51     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@SQ     SN:MT   LN:16569        AS:GRCh37       M5:c68f52674c9fb33aef52dcf399755519     UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   SP:Homo Sapiens
@RG     ID:4    LB:lib1 PL:ILLUMINA     SM:A   PU:unit1
A00761:658:HLJMHDRXY:1:2124:3748:21324  99      3       184360590       60      49M23I35M       =       184360590       84      ACCATTCTAATTACAGTGGGGGAAGTGTGCGCGCAAGTGTGTGCGTGTGTGCGCACGCGCAGGTGTGTGCGTGAGTGAAGGGATGTAGACAGAGGCTGCTGGCCGGG     FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   MC:Z:49M23I35M  MD:Z:84 PG:Z:bwa        RG:Z:4  MI:Z:8193187/A  NM:i:23 MQ:i:60 UQ:i:0  AS:i:55 RX:Z:TTA-TCT
A00761:658:HLJMHDRXY:1:2124:3748:21324  147     3       184360590       60      49M23I35M       =       184360590       -84     ACCATTCTAATTACAGTGGGGGAAGTGTGCGCGCAAGTGTGTGCGTGTGTGCGCACGCGCAGGTGTGTGCGTGAGTGAAGGGATGTAGACAGAGGCTGCTGGCCGGG     FFFFFFFFFFF:FFFFFFFFFFFFFFFF:F
FFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFF:FFF:FFFFFFFFFFFFFFFFFFFFF   MC:Z:49M23I35M  MD:Z:84 PG:Z:bwa        RG:Z:4  MI:Z:8193187/A  NM:i:23 MQ:i:60 UQ:i:0  AS:i:55 RX:Z:TTA-TCT
@nh13 nh13 self-assigned this Jan 14, 2022
@nh13 nh13 added the bug label Jan 14, 2022
@nh13
Copy link
Member

nh13 commented Jan 14, 2022

@ruolin I think this is a bug. I'll update shortly.

@nh13
Copy link
Member

nh13 commented Jan 14, 2022

Buggy output, where the reads are shorter by exactly the # of bases inserted.

@HD	VN:1.6	SO:queryname	GO:none
@RG	ID:A	LB:lib1	SM:A	PL:ILLUMINA	PU:unit1
@CO	Read group A contains consensus reads generated from 1 input read groups.
lib1:8193187/A	77	*	0	0	*	*	0	0	ACCATTCTAATTACAGTGGGGGAAGTGTGCGCGCAAGTGTGTGCGTGTGTGCGCACGCGCAGGTGTGTGCGTGAGTGAAGGGAT	CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC	cD:i:1	cE:f:0.0	RG:Z:A	MI:Z:8193187/A	cM:i:1	RX:Z:TTA-TCT	cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1	ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
lib1:8193187/A	141	*	0	0	*	*	0	0	CCCGGCCAGCAGCCTCTGTCTACATCCCTTCACTCACGCACACACCTGCGCGTGCGCACACACGCACACACTTGCGCGCACACT	CCCCCCCCCCCCCCCCCCCCC9CCC9CCC9CCCCCCCCC9CCCCCCCCCCCCCCCCCCCCCCCCCCCCC99CCCCCCC9CCCCC	cD:i:1	cE:f:0.0	RG:Z:A	MI:Z:8193187/A	cM:i:1	RX:Z:TTA-TCT	cd:B:s,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1	ce:B:s,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

@nh13
Copy link
Member

nh13 commented Jan 14, 2022

@ruolin we do some upfront quality trimming, but more importantly trimming based on the insert size, so that we don't include reads on end of a pair that map past bases on the mate. But this calculate (I think) is wrong based on your example, and causes the read to be truncated/shortened by exactly the insertion in your case, and more generally the difference in mapped read bases versus mapped reference bases. I have tested your test case on the fix.

Thank-you for the very clear report and test case. I am hoping we can include the fix soon.

@ruolin
Copy link
Author

ruolin commented Jan 14, 2022

@nh13 Hi Thanks for quick update. The quality trimming and the pass the mate end trimming might explain some of the other truncations I have seen. But There is currently no option to turn the trimming off in the CallConsensus. Is that right? Will you be able to include a new option in the next release that does no trimming at all, as intact as possible? Also, I think both CallMolecularConsensusReads and CallDuplexConsensusReads are affected.

@nh13
Copy link
Member

nh13 commented Jan 14, 2022

@ruolin For quality trimming and mate-end trimming, it's always possible to add a command line option. It makes more sense for the former, but for the latter, I am not sure it makes a lot of sense to skip trimming if we have reads that past the start of its mate. I think rather than turning off the trimming, you could try running the fix to see if that works?

@ruolin
Copy link
Author

ruolin commented Jan 19, 2022

Sounds good. thanks.

@ruolin ruolin closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants