Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug counting alignments in cram files #1048

Closed
matthdsm opened this issue Sep 24, 2021 · 3 comments · Fixed by #1054
Closed

bug counting alignments in cram files #1048

matthdsm opened this issue Sep 24, 2021 · 3 comments · Fixed by #1054

Comments

@matthdsm
Copy link

Hi,

I noticed a discrepancy when loading a cram vs bam file.
BAM:

  • pysam v0.16.0.1
reads_file = pysam.AlignmentFile(infile, "rb")
reads_file.mapped
176197222
reads_file.unmapped
2958898
  • samtools v1.12
# mapped reads
samtools view -c -F 260 sample.bam 
176197222

# unmapped reads
samtools view -c -f 4 sample.bam 
2958898

CRAM:

  • pysam v0.16.0.1
reads_file = pysam.AlignmentFile(infile, "rc", reference_filename=reference)
reads_file.mapped
0
reads_file.unmapped
6073771974947137635
  • samtools v1.12
# mapped reads
samtools view -c -F 260  -T reference.fa sample.cram 
176197222

# unmapped reads
samtools view -c -f 4   -T reference.fa sample.cram 
2958898

Is there a way to fix this? HTSlib seems to parse everything correctly

Cheers
M

@jmarshall
Copy link
Member

CRAM indexes do not contain the mapped/unmapped counts that are being used to calculate these total count statistics. So the reasonable thing to do would be to leave the totals as 0 / None / raise an exception in this case.

@matthdsm
Copy link
Author

Hi John,

Ok, makes sense, but what about the huge number of unmapped reads?
It's way bigger than the total number of reads.

Thanks
M

@jmarshall
Copy link
Member

jmarshall commented Sep 28, 2021

It's almost as though it's returning uninitialised memory…
This is a minor HTSlib bug; see samtools/htslib#1335.

jmarshall added a commit that referenced this issue Nov 12, 2021
For .crai indexes, hts_idx_get_stat() and hts_idx_get_n_no_coor()
return 0 as these indexes don't record these statistics. Mention
this in the documentation for the `AlignmentFile` methods that
use these routines. Clarifies #1048 and #1060.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants