-
Notifications
You must be signed in to change notification settings - Fork 154
invalid base (char): '=' / erroneous "Failed to populate reference' condition when calling from CRAM files #109
Comments
The error message should refer to an '=' base in one of the input reads. The ambiguous bases in the reference should not trigger an error (any non [ACGT] base in the reference is just converted to N). Although '=' in the reads is in the BAM spec, we do not support it, in part because we've never encountered this in any real world workflow. This is the second report we have of this error occurring fromk CRAM input. Can you describe how this cram file was created? (ie. what was the mapper, and what software was used to convert SAM/BAM to CRAM). If the use of '=' in the reads is going to become a common side-effect of CRAM compression we should prioritize supporting this. |
I believe it's samtools, but I'm double checking. |
Thanks, we haven't observed this yet but will try to reproduce with the latest samtools/htslib release. |
Hi @bw2 - We haven't been able to reproduce the problem, if you can identify the mapper/cram compression tool that is producing this error we'll look further into it. -c |
I've also run into this same error.
The steps from FASTQ to CRAM that read/write bam or cram are from the Broad Institute's wgs-germline-snps-indels-PairedEndSingleSampleWorkflow.wdl: bwa mem -> picard MergeBamAlignments -> picard MarkDuplicates -> picard SortSam -> gatk4 ApplyBQSR -> picard GatherBamFiles -> samtools samtools command:
Running picard's ValidateSamFile in verbose mode (only ignoring MIISING_TAG_NM) returns no errors in the cram file produced by samtools. All software was installed from bioconda:
binary release of manta 1.2.2 was downloaded from the release page on github. System: |
Thanks Anthony, Sorry just getting to this now. Any chance you can share this CRAM file? |
Also got this error.
I have used
|
Hi Joon, Thanks for reporting this. Can you share any data that would reproduce this issue? Your error message helpfully points out that there is an internal htslib error proceeding this issue ("Failed to populate reference for id XXX"). We already have some improved exception messages and an update to htslib 1.6 in the next manta release, but I will check if we can do more to appropriately capture this particular error from htslib. In the meantime, being able to reproduce this issue from the alignment files would be the most effective way to create a reliable fix if you can share. |
Thanks to additional details from @joonan30 and others on this thread, I have been able to confirm that the issue arises from an htslib behavior. This does not reflect the use of the '='/ANY symbol in the source BAM/CRAM file. I have described the issue for htslib here samtools/htslib#654 and it has been reproduced outside of manta there. I will reopen this issue in recognition of the linked htslib issue, and close this once it is resolved in htslib and the update is forwarded into manta's htslib copy. |
Hi Chris, Thanks for checking this. I noticed the new release, which I guess .. not for this issue? |
The fix is in process, but it is not in the Manta v1.3.0 release. With your help I was able to generate htslib issue samtools/htslib#654 describing the error details in htslib, and the corresponding fixes to htslib just merged today (here: samtools/htslib#655). I'd like us to update manta's htslib from this merge point and issue a patch update to Manta, but it will take at least a few days to fully turn the crank on this process. |
The fix for this issue is now in manta's development branch. A preview of the fix is pushed to github here: https://github.com/Illumina/manta/tree/issue109fixPreview It will be included in a patch update to v1.3.0, coming soon. |
Released updated htslib with v1.3.1 today, this should close the CRAM stability issues described here. |
I've tried running Manta with default settings on multiple GRCh38-aligned WGS .crams, and am always getting this error:
Glancing at the source code, it looks like Manta only supports ACGTN bases
manta/src/c++/lib/blt_util/seq_util.hh
Line 182 in 93392b5
?
When I grep through the GRCh38 fasta though, I find a few other bases like W, M, B, etc.:
Either way, I'm not seeing '=' chars in the reference, and would appreciate any help with debugging this.
The text was updated successfully, but these errors were encountered: