Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix fuzz integer overflow in cram encoder. #1775

Merged
merged 1 commit into from
May 2, 2024

Conversation

jkbonfield
Copy link
Contributor

Input files with very long CIGAR strings and consensus generated embedded reference can lead to exceptionally long CRAM blocks which overflow the check for large size fluctuations (to trigger new compression metric assessments).

Reformulated the expression to avoid scaling up values.

Credit to OSS-Fuzz
Fixes oss-fuzz 68225

cram/cram_io.c Outdated
Comment on lines 1988 to 1989
((b->uncomp_size + 1000)/4 > metrics->input_avg_sz+1000 ||
b->uncomp_size + 1000 < (metrics->input_avg_sz+1000)/4) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly, I think this can still overflow due to the + 1000 on both sides. However, by expanding out the brackets and subtracting 1000 from both sides it's possible to get something that behaves itself:

Suggested change
((b->uncomp_size + 1000)/4 > metrics->input_avg_sz+1000 ||
b->uncomp_size + 1000 < (metrics->input_avg_sz+1000)/4) &&
(b->uncomp_size/4 - 750 > metrics->input_avg_sz ||
b->uncomp_size < metrics->input_avg_sz/4 - 750) &&

Note that this relies on both b->uncomp_size and metrics->input_avg_sz being signed values. As the next part also has this requirement I don't think it's a major problem, but we might want to bear it in mind if we ever wanted to change either to be unsigned (e.g. size_t).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've worked through the expansion and I agree with the same thing. It's a bit obscure, so I'll add a comment.

Input files with very long CIGAR strings and consensus generated
embedded reference can lead to exceptionally long CRAM blocks which
overflow the check for large size fluctuations (to trigger new
compression metric assessments).

Reformulated the expression to avoid scaling up values.

Credit to OSS-Fuzz
Fixes oss-fuzz 68225
@daviesrob
Copy link
Member

Rebased to pick up the recent test fixes...

@daviesrob daviesrob merged commit 1e7efc0 into samtools:develop May 2, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants