Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BigInt#to_s emitting null bytes for certain values #11063

Merged
merged 1 commit into from
Aug 11, 2021

Conversation

HertzDevil
Copy link
Contributor

mpz_sizeinbase does not always return an exact number of digits when the base argument is not a power of 2:

Return the size of op measured in number of digits in the given base. base can vary from 2 to 62. The sign of op is ignored, just the absolute value is used. The result will be either exact or 1 too big. If base is a power of 2, the result is always exact. If op is zero the return value is always 1.

This means the last byte of the returned string can be a null byte if mpz_sizeinbase is greater than the exact count:

85.to_big_i.to_s # => "85\u0000"
(-85).to_big_i.to_s # => "-85\u0000"

This PR fixes that. Follow-up to #10926.

Originally discovered in https://github.com/crystal-money/money/runs/3245218564

@HertzDevil HertzDevil added kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:regression Something that used to correctly work but no longer works topic:stdlib:numeric labels Aug 4, 2021
@straight-shoota
Copy link
Member

I'm wondering if we can't just combine both branches instead of duplicating a lot of very similar code.

@HertzDevil
Copy link
Contributor Author

They look very different to me and I think merging the two branches would make the logic even harder to follow.

@straight-shoota
Copy link
Member

@HertzDevil
Copy link
Contributor Author

And now the logic of whether to reserve space for zero padding and rewrite the minus sign is hidden behind those extra clamping operations, when the intent should have been very clear: more digits (precision) are requested than there are digits (count). When reviewing the second version I have to internally read it twice based on whether offset is clamped or not, and this merged version is longer than either branch in the original, so it ends up creating more cognitive load now. Deduplication does not always make code more readable.

@straight-shoota
Copy link
Member

The offset calculation via max and clamp can be simplified to a single conditional. I've added a commit to fix that. That should better express the intent.

The combined implementation adds three conditionals and a couple of simple expressions, while also simplifying some existing expressions. For me, that results in a net reduction of cognitive load because I don't have to grok two different, but very similar branches while trying to keep track of their minimal differences.

@HertzDevil
Copy link
Contributor Author

That's because these aren't "minimal" differences; the two branches are dissimilar enough that they deserve different considerations, and if any portions between them do look alike, that doesn't imply it is desirable to factor out the changing parts rather than the common parts. That cognitive load does not come from studying the code's correctness.

I would especially like not to over-factor code like that when it comes to code that deals with C or unsafe memory.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with merging this either way, as it fixes a bug.

Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@beta-ziliani beta-ziliani added this to the 1.2.0 milestone Aug 10, 2021
@straight-shoota straight-shoota merged commit 3ca4acf into crystal-lang:master Aug 11, 2021
@HertzDevil HertzDevil deleted the bug/bigint-to_s branch August 12, 2021 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:regression Something that used to correctly work but no longer works topic:stdlib:numeric
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants