Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified base parsing failure failure #1466

Closed
cjw85 opened this issue Jul 1, 2022 · 0 comments · Fixed by #1469
Closed

Modified base parsing failure failure #1466

cjw85 opened this issue Jul 1, 2022 · 0 comments · Fixed by #1469
Assignees

Comments

@cjw85
Copy link

cjw85 commented Jul 1, 2022

The functions bam_mods_at_next_pos and bam_next_basemod fail to detect when and MM tag is corrupt for reverse-mapped reads.

For example, the records:

@SQ     SN:ref  LN:64
reverse 16      ref     1       255     21M     *       0       21      GGGGGGGTCTCTAACGACCAA   *       MM:Z:T+T?,5,0,0,0,0;    ML:B:C,182,3,4,3,3
forward 0       ref     1       255     21M     *       0       21      TTGGTCGTTAGAGACCCCCCC   *       MM:Z:T+T?,5,0,0,0,0;    ML:B:C,182,3,4,3,3

contain malformed MM tags. The queries contain only 5 T bases (A for reverse record) but the MM tag informs that the first modification is located after skipping over the first 5 Ts (As from the right).

Putting these records through the test program test/test_mod in htslib yields:

0	G
1	G
2	G
3	G
4	G
5	G
6	G
7	T
8	C
9	T
10	C
11	T
12	A	T+T3
13	A	T+T3
14	C
15	G
16	A	T+T4
17	C
18	C
19	A	T+T3
20	A	T+T182
---
Present: T
12	A	T+T3
13	A	T+T3
16	A	T+T4
19	A	T+T3
20	A	T+T182

===

0	T
1	T
2	G
3	G
4	T
5	C
6	G
7	T
8	T
9	A
10	G
11	A
12	G
13	A
14	C
15	C
16	C
17	C
18	C
19	C
20	C
---
Present: T
[W::bam_next_basemod] MM tag refers to bases beyond sequence length

It is tripping up only on the forward record, not the reverse record.

Escalated from: epi2me-labs/modbam2bed#21

jkbonfield added a commit to jkbonfield/htslib that referenced this issue Jul 6, 2022
If we have an MM tag with base-type specific coordinates beyond the
end of the sequence as there are too few bases of that type, then we
now detect this within bam_parse_basemod.

This was already checked within bam_next_basemod for forward reads,
but not spotted in reverse complemented ones.

Fixes samtools#1466
daviesrob pushed a commit that referenced this issue Jul 8, 2022
If we have an MM tag with base-type specific coordinates beyond the
end of the sequence as there are too few bases of that type, then we
now detect this within bam_parse_basemod.

This was already checked within bam_next_basemod for forward reads,
but not spotted in reverse complemented ones.

Fixes #1466
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants