Fix `BigInt#to_s` emitting null bytes for certain values #11063

HertzDevil · 2021-08-04T22:00:17Z

mpz_sizeinbase does not always return an exact number of digits when the base argument is not a power of 2:

Return the size of op measured in number of digits in the given base. base can vary from 2 to 62. The sign of op is ignored, just the absolute value is used. The result will be either exact or 1 too big. If base is a power of 2, the result is always exact. If op is zero the return value is always 1.

This means the last byte of the returned string can be a null byte if mpz_sizeinbase is greater than the exact count:

85.to_big_i.to_s # => "85\u0000"
(-85).to_big_i.to_s # => "-85\u0000"

This PR fixes that. Follow-up to #10926.

Originally discovered in https://github.com/crystal-money/money/runs/3245218564

straight-shoota · 2021-08-04T22:17:48Z

I'm wondering if we can't just combine both branches instead of duplicating a lot of very similar code.

HertzDevil · 2021-08-04T23:05:00Z

They look very different to me and I think merging the two branches would make the logic even harder to follow.

straight-shoota · 2021-08-05T09:09:32Z

The diff really isn't that big: https://github.com/HertzDevil/crystal/compare/bug/bigint-to_s...straight-shoota:bug/bigint-to_s?w=1

HertzDevil · 2021-08-05T10:08:04Z

And now the logic of whether to reserve space for zero padding and rewrite the minus sign is hidden behind those extra clamping operations, when the intent should have been very clear: more digits (precision) are requested than there are digits (count). When reviewing the second version I have to internally read it twice based on whether offset is clamped or not, and this merged version is longer than either branch in the original, so it ends up creating more cognitive load now. Deduplication does not always make code more readable.

straight-shoota · 2021-08-05T10:25:26Z

The offset calculation via max and clamp can be simplified to a single conditional. I've added a commit to fix that. That should better express the intent.

The combined implementation adds three conditionals and a couple of simple expressions, while also simplifying some existing expressions. For me, that results in a net reduction of cognitive load because I don't have to grok two different, but very similar branches while trying to keep track of their minimal differences.

HertzDevil · 2021-08-05T11:05:58Z

That's because these aren't "minimal" differences; the two branches are dissimilar enough that they deserve different considerations, and if any portions between them do look alike, that doesn't imply it is desirable to factor out the changing parts rather than the common parts. That cognitive load does not come from studying the code's correctness.

I would especially like not to over-factor code like that when it comes to code that deals with C or unsafe memory.

straight-shoota

I'm fine with merging this either way, as it fixes a bug.

beta-ziliani

LGTM

Fix BigInt#to_s emitting null bytes for certain values

274e7d5

HertzDevil added kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:regression Something that used to correctly work but no longer works topic:stdlib:numeric labels Aug 4, 2021

straight-shoota approved these changes Aug 6, 2021

View reviewed changes

beta-ziliani approved these changes Aug 10, 2021

View reviewed changes

beta-ziliani added this to the 1.2.0 milestone Aug 10, 2021

straight-shoota merged commit 3ca4acf into crystal-lang:master Aug 11, 2021

straight-shoota mentioned this pull request Aug 11, 2021

Refactor BigInt#to_s #11081

Open

HertzDevil deleted the bug/bigint-to_s branch August 12, 2021 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `BigInt#to_s` emitting null bytes for certain values #11063

Fix `BigInt#to_s` emitting null bytes for certain values #11063

HertzDevil commented Aug 4, 2021

straight-shoota commented Aug 4, 2021

HertzDevil commented Aug 4, 2021

straight-shoota commented Aug 5, 2021

HertzDevil commented Aug 5, 2021

straight-shoota commented Aug 5, 2021

HertzDevil commented Aug 5, 2021

straight-shoota left a comment

beta-ziliani left a comment

Fix BigInt#to_s emitting null bytes for certain values #11063

Fix BigInt#to_s emitting null bytes for certain values #11063

Conversation

HertzDevil commented Aug 4, 2021

straight-shoota commented Aug 4, 2021

HertzDevil commented Aug 4, 2021

straight-shoota commented Aug 5, 2021

HertzDevil commented Aug 5, 2021

straight-shoota commented Aug 5, 2021

HertzDevil commented Aug 5, 2021

straight-shoota left a comment

Choose a reason for hiding this comment

beta-ziliani left a comment

Choose a reason for hiding this comment

Fix `BigInt#to_s` emitting null bytes for certain values #11063

Fix `BigInt#to_s` emitting null bytes for certain values #11063