R4R stake invariance bug #4389

rigelrozanski · 2019-05-21T16:54:00Z

also made onOperation not a global var but a flag --SimulateEveryOperation

test command: go test ./simapp/... -run TestFullAppSimulation -SimulationEnabled=true -SimulationNumBlocks=500 -SimulationGenesis= -SimulationVerbose=true -SimulationCommit=true -SimulationSeed=8090485 -SimulationPeriod=50 -v -timeout 24h

Linked to github-issue with discussion and accepted design OR link to spec that describes this work.
Wrote tests
Updated relevant documentation (docs/)
Added a relevant changelog entry: clog add [section] [stanza] [message]
rereviewed Files changed in the github PR explorer

For Admin Use:

Added appropriate labels to PR (ex. wip, ready-for-review, docs)
Reviewers Assigned
Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

codecov · 2019-05-21T17:15:00Z

Codecov Report

Merging #4389 into master will increase coverage by 0.88%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master    #4389      +/-   ##
==========================================
+ Coverage    57.2%   58.09%   +0.88%     
==========================================
  Files         241      235       -6     
  Lines       15125    14881     -244     
==========================================
- Hits         8653     8645       -8     
+ Misses       5842     5606     -236     
  Partials      630      630

npinto · 2019-05-21T17:53:55Z

Merging #4389 into master will increase coverage by 0.87%.

Glad this single seed is drastically increasing coverage! That's the whole point of our AI-driven explorations ;-)

rigelrozanski · 2019-05-21T21:39:44Z

So what's happening is that this is an edge case of the same bug in #4094

In this particular case the currentStakeRoundUp is actually the same as currentStake due to the rounding direction being the same

reference calculation:

(100503150419.972974482127094959 * 85331552243) / 104618427897.719998091992964075 = 
81974944596.112730277011189509916758442503168929034

I'm not really concerned about this difference, so I've simply expanded the definition of currentStakeRoundUp:

		currentStakeRoundUp := sdk.MaxDec(
			val.TokensFromSharesRoundUp(del.GetShares()),
			currentStake.Add(sdk.SmallestDec()),
		)

…imal-precision

x/distribution/keeper/delegation.go

simapp/sim_test.go

alexanderbez · 2019-05-22T19:29:36Z

Failing CI @rigelrozanski

cwgoes

Concerned about this approach

cwgoes · 2019-05-22T19:35:11Z

x/distribution/keeper/delegation.go

+		// currentStake.  This amount of error is tolerated and corrected for,
+		// however any greater amount should be considered a breach in expected
+		// behaviour.
+		if stake.Equal(currentStake.Add(sdk.SmallestDec())) {


Are we sure that our rounding error will never exceed sdk.SmallestDec()? Why?

IMO the approach of always rounding in the same direction is much safer, is there a reason we can't do that?

IMO the approach of always rounding in the same direction is much safer, is there a reason we can't do that?

We are always rounding in the same direction here. currentStake.Add(sdk.SmallestDec() is effectively rounding up by one smallest decimal point no matter what. The whole edge case of this but is that sometimes currentStake and currentStakeRoundUp were the same value, where due to the inconsistencies in calculating stake across multiple periods it would round up slightly greater. Note that the greatest value that currentStakeRoundUp could have ever achieved is currentStake.Add(sdk.SmallestDec()- now we're just capping it at that value

Are we sure that our rounding error will never exceed sdk.SmallestDec()? Why?

My current understanding of this discrepancy is that its probabilistic - based on specific uses of validator periods the rounding errors can compound to ultimately increase the lowest decimal place by greater than 1. I've not been able to show how this could be greater than one decimal point, however I don't have a proof that it contained within 1 decimal point. During all simulations however we have never seen greater than 1 decimal point increase, only here here we are seeing a 1 decimal point increase from currentStakeRoundUp which holds the same value as currentStake (see the reference calculation #4389 (comment))

If you have any specific alternative suggestions I'm open to them, however I honestly don't see a viable alternative given the current distribution mechanism design.

edit: we could potentially somehow temporarily increase the number of decimal places used within this calculation of calculateDelegationRewards however it would require significant modifications to decimal, I'm not prepared to take this on atm, however it would probably do the trick

opened up an issue #4399 - I think that's the long term solution, however I don't have short-term capacity for that, and this is still a bug, so I suggest we merge this as is, until performing the changes to decimal

We are always rounding in the same direction here.

I don't mean in the sanity check, I mean in all other instances contributing to stake

where due to the inconsistencies in calculating stake across multiple periods it would round up slightly greater

It's these "inconsistencies" that I'm concerned about - shouldn't it be possible to ensure that they are all "inconsistent" in the same (safe) direction?

If you have any specific alternative suggestions I'm open to them, however I honestly don't see a viable alternative given the current distribution mechanism design.

Yes, in principle I don't see how we can't ensure that all operations round in a particular, predetermined direction if in fact they will lose precision. Is there a counterexample of that?

My guess is that we ought to be rounding in some direction somewhere where we aren't.

My guess is that we ought to be rounding in some direction somewhere where we aren't.

I don't mean in the sanity check, I mean in all other instances contributing to stake

Well without guessing, aka by reviewing the code for the Nth time, I can say with strong confidence that this guess is incorrect, in all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake. Here let me baby you through the important steps involved:

// event.Fraction comes from updatedFraction // increasing event.Fraction decreases the multiplier, decreasing stake stake = stake.MulTruncate(sdk.OneDec().Sub(event.Fraction)) // decreasing currentMultiplicand increases updatedFraction which decreases stake // decreasing newMultiplicand increases updatedFraction which decreases stake // the use of MulTruncate here, increases updatedFraction which decreases stake updatedFraction := sdk.OneDec().Sub(currentMultiplicand.MulTruncate(newMultiplicand)) // current fraction comes from updatedFraction (recursive) currentMultiplicand := sdk.OneDec().Sub(currentFraction) // fraction comes from effectiveFraction, // increasing fraction causes decreasing newMultiplicand increases updatedFraction which decreases stake newMultiplicand := sdk.OneDec().Sub(fraction) // here the use of QuoRoundUp increases effectiveFraction which (see above) desceases stake effectiveFraction := tokensToBurn.ToDec().QuoRoundUp(validator.Tokens.ToDec()) // here tokensToBurn are actually applied to the tokens within staking

https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/distribution/keeper/delegation.go#L87 https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/distribution/keeper/validator.go#L111 https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/staking/keeper/slash.go#L109

Yes, in principle I don't see how we can't ensure that all operations round in a particular, predetermined direction if in fact they will lose precision. Is there a counterexample of that?

The counterexample is not of our concern as it reduces the value of stake which is a situation we ignore. However based on how the code is structured (see the code snippets ^ ) it's very clear that stake will very often be < currentStake based on our rounding direction.

It's these "inconsistencies" that I'm concerned about - shouldn't it be possible to ensure that they are all "inconsistent" in the same (safe) direction?

If we increase the decimal precision for use in this calculation we can ensure that the inconsistencies are tolerable and we can remove this special correction factor, aka:

if stake.Equal(currentStake.Add(sdk.SmallestDec())) { stake = currentStake

can be removed. See #4399 for more thoughts on that

tbh I think I'm just about through going over this code/explanations, I'm very confident this solution makes sense - Of course, I invite you to implement an alternative solution if you have strong justifications (or implement #4399, which I'll be taking on within the next few releases or so probably either way).

Thanks @rigelrozanski - sorry if I come across as obstinate, there's just a lot of room for error in multi-stage rounding (including possible bugs in code I previously wrote).

Is it possible to change the case which causes stake.GT(currentStake) - https://github.com/cosmos/cosmos-sdk/pull/4389/files#diff-de71192adba700b3255b1eab62bef50eR100 - to sometimes be true? If "all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake", then we should never hit this conditional, unless currentStake was also rounded down, right?

The old comment discusses stake being multiplied by slash fractions - https://github.com/cosmos/cosmos-sdk/pull/4389/files#diff-de71192adba700b3255b1eab62bef50eL101 - is that referring to "currentStake" being multiplied by a slash fraction, rounded down, and ending up less than stake? Have we considered instead rounding up when slashes happen (but continuing to round down when accounting in F1) - if that's safe, it might save us this check.

My current understanding of this discrepancy is that its probabilistic - based on specific uses of validator periods the rounding errors can compound to ultimately increase the lowest decimal place by greater than 1. I've not been able to show how this could be greater than one decimal point, however I don't have a proof that it contained within 1 decimal point. During all simulations however we have never seen greater than 1 decimal point increase, only here here we are seeing a 1 decimal point increase from currentStakeRoundUp which holds the same value as currentStake.

This is my specific concern. I guess this concern doesn't need to block this PR, since whatever such bug, if present, will have previously existed regardless - this PR at minimum reduces the likelihood, I agree about that. My line of questioning is as to whether we can completely eliminate the possibility of a discrepancy here.

Does that make sense?

no worries on being thorough - I guess I was just looking for more thoughtful criticism's off the bat, but I do now understand what you're referring too.

Is it possible to change the case which causes stake.GT(currentStake) - to sometimes be true? If "all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake", then we should never hit this conditional, unless currentStake was also rounded down, right?

So stake.GT(currentStake) will often be true, when the calculations are working as expected. We hit this conditional in a non-erroring (expected) way now because we are kind of "splitting up" the whole act of slashing stake within the distribution mechanism by validator periods. Without ever rounding in a direction which would increase this stake, just the fact that we've split this up means that where the slashing is not split up per delegation as it is in distribution, the combined slashing can slash more which is inherent to slashing for all all delegations simultaneously instead of by period. When we break out slashing by period, even if we round down for each slash fraction, it's possible due to how much is being rounded that we slash less when slashing by period instead of for when we slash without periods. In other words, The full slash, and the slashing by period could both be rounding down but the slashing by period is simply rounding down less.

I don't know how else I can explain this short of creating a numerical example (which you should try for yourself!)... But I get it, like this is a really bizarre occurrence.

is that referring to "currentStake" being multiplied by a slash fraction, rounded down, and ending up less than stake? Have we considered instead rounding up when slashes happen (but continuing to round down when accounting in F1) - if that's safe, it might save us this check.

This wouldn't affect this scenario. For the same reasons as explained above, this bug doesn't have to do so much with the direction of rounding (which are all setup to avoid this panic condition) but with the magnitude of rounding which is changed because we are splitting up the slashing by periods.

This is my specific concern. I guess this concern doesn't need to block this PR, since whatever such bug, if present, will have previously existed regardless - this PR at minimum reduces the likelihood, I agree about that. My line of questioning is as to whether we can completely eliminate the possibility of a discrepancy here.
Does that make sense?

I understand your thinking, but for the bizarre reasons I've explained, until we increase the precision for this calculation, we're going to have to live with this solution AFAICT

When we break out slashing by period, even if we round down for each slash fraction, it's possible due to how much is being rounded that we slash less when slashing by period instead of for when we slash without periods. In other words, The full slash, and the slashing by period could both be rounding down but the slashing by period is simply rounding down less.

Ah now I see, thanks - yes, that would be possible, makes sense.

Can you copy your above explanation into the comment (it should have been there in the first place, which is probably on me)?

I understand your thinking, but for the bizarre reasons I've explained, until we increase the precision for this calculation, we're going to have to live with this solution AFAICT

Right, better precision would possibly enable a tighter bound. I wonder if we should actually increase the bound (out of which we panic) given the possible accumulated rounding errors and the unlikelihood of this being exploitable (since slashes, which are economically costly, are required), but if this never fails in simulation it does seem unlikely to happen in practice.

Yes this conversation has brought further clarity - I will improve the comment to incorporate thoughts from our conversation.

Right, better precision would possibly enable a tighter bound.

That's the thinking, the hope is that by the end of the calculation, when the high precision decimal is being reduced back to default decimal places, the in bound is eliminated.

I wonder if we should actually increase the bound (out of which we panic) given the possible accumulated rounding errors and the unlikelihood of this being exploitable (since slashes, which are economically costly, are required)

I would be comfortable increasing it up to 5 smallest points, this should entirely eliminate the possibility of this error. Maybe we include 3 right now?

@alexanderbez

* add trouble seed * currentStakeRoundUp is now always atleast currentStake + smallest-decimal-precision * remove unused code * remove debugs * @alexanderbez comment * compile fix * better comment, increase tolerance to 3 smallest decimal points

add trouble seed

7521446

rigelrozanski requested review from alessio, alexanderbez and jackzampolin as code owners May 21, 2019 16:54

rigelrozanski added 3 commits May 21, 2019 17:42

currentStakeRoundUp is now always atleast currentStake + smallest-dec…

0f042ce

…imal-precision

remove unused code

763a8fa

remove debugs

6111165

rigelrozanski changed the title ~~WIP stake invariance bug~~ R4R stake invariance bug May 21, 2019

rigelrozanski mentioned this pull request May 21, 2019

Random Fuzzer panic: calculated final stake greater than current stake (on f0b690ba6e2) #4383

Closed

4 tasks

alexanderbez added ready-for-review C:x/staking C:Simulations labels May 22, 2019

alexanderbez suggested changes May 22, 2019

View reviewed changes

x/distribution/keeper/delegation.go Outdated Show resolved Hide resolved

simapp/sim_test.go Show resolved Hide resolved

alexanderbez requested a review from cwgoes May 22, 2019 14:11

@alexanderbez comment

58f18a1

alexanderbez approved these changes May 22, 2019

View reviewed changes

cwgoes suggested changes May 22, 2019

View reviewed changes

compile fix

711ad3a

rigelrozanski mentioned this pull request May 22, 2019

Custom decimal places for specific calculations #4399

Open

alexanderbez requested a review from cwgoes May 22, 2019 21:18

cwgoes approved these changes May 24, 2019

View reviewed changes

better comment, increase tolerance to 3 smallest decimal points

d2d8c7e

rigelrozanski merged commit 38f49e4 into master May 25, 2019

rigelrozanski deleted the rigel/4383-stake-invariance branch May 25, 2019 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R4R stake invariance bug #4389

R4R stake invariance bug #4389

rigelrozanski commented May 21, 2019 •

edited

Loading

codecov bot commented May 21, 2019 •

edited

Loading

npinto commented May 21, 2019

rigelrozanski commented May 21, 2019

alexanderbez commented May 22, 2019

cwgoes left a comment

cwgoes May 22, 2019

rigelrozanski May 22, 2019 •

edited

Loading

rigelrozanski May 22, 2019

cwgoes May 23, 2019

rigelrozanski May 23, 2019

cwgoes May 23, 2019 •

edited

Loading

rigelrozanski May 23, 2019 •

edited

Loading

cwgoes May 24, 2019 •

edited

Loading

rigelrozanski May 24, 2019

R4R stake invariance bug #4389

R4R stake invariance bug #4389

Conversation

rigelrozanski commented May 21, 2019 • edited Loading

codecov bot commented May 21, 2019 • edited Loading

Codecov Report

npinto commented May 21, 2019

rigelrozanski commented May 21, 2019

alexanderbez commented May 22, 2019

cwgoes left a comment

Choose a reason for hiding this comment

cwgoes May 22, 2019

Choose a reason for hiding this comment

rigelrozanski May 22, 2019 • edited Loading

Choose a reason for hiding this comment

rigelrozanski May 22, 2019

Choose a reason for hiding this comment

cwgoes May 23, 2019

Choose a reason for hiding this comment

rigelrozanski May 23, 2019

Choose a reason for hiding this comment

cwgoes May 23, 2019 • edited Loading

Choose a reason for hiding this comment

rigelrozanski May 23, 2019 • edited Loading

Choose a reason for hiding this comment

cwgoes May 24, 2019 • edited Loading

Choose a reason for hiding this comment

rigelrozanski May 24, 2019

Choose a reason for hiding this comment

rigelrozanski commented May 21, 2019 •

edited

Loading

codecov bot commented May 21, 2019 •

edited

Loading

rigelrozanski May 22, 2019 •

edited

Loading

cwgoes May 23, 2019 •

edited

Loading

rigelrozanski May 23, 2019 •

edited

Loading

cwgoes May 24, 2019 •

edited

Loading