-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R4R stake invariance bug #4389
R4R stake invariance bug #4389
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4389 +/- ##
==========================================
+ Coverage 57.2% 58.09% +0.88%
==========================================
Files 241 235 -6
Lines 15125 14881 -244
==========================================
- Hits 8653 8645 -8
+ Misses 5842 5606 -236
Partials 630 630 |
Glad this single seed is drastically increasing coverage! That's the whole point of our AI-driven explorations ;-) |
So what's happening is that this is an edge case of the same bug in #4094 In this particular case the reference calculation:
I'm not really concerned about this difference, so I've simply expanded the definition of
|
Failing CI @rigelrozanski |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerned about this approach
x/distribution/keeper/delegation.go
Outdated
// currentStake. This amount of error is tolerated and corrected for, | ||
// however any greater amount should be considered a breach in expected | ||
// behaviour. | ||
if stake.Equal(currentStake.Add(sdk.SmallestDec())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that our rounding error will never exceed sdk.SmallestDec()
? Why?
IMO the approach of always rounding in the same direction is much safer, is there a reason we can't do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the approach of always rounding in the same direction is much safer, is there a reason we can't do that?
We are always rounding in the same direction here. currentStake.Add(sdk.SmallestDec()
is effectively rounding up by one smallest decimal point no matter what. The whole edge case of this but is that sometimes currentStake
and currentStakeRoundUp
were the same value, where due to the inconsistencies in calculating stake across multiple periods it would round up slightly greater. Note that the greatest value that currentStakeRoundUp
could have ever achieved is currentStake.Add(sdk.SmallestDec()
- now we're just capping it at that value
Are we sure that our rounding error will never exceed sdk.SmallestDec()? Why?
My current understanding of this discrepancy is that its probabilistic - based on specific uses of validator periods the rounding errors can compound to ultimately increase the lowest decimal place by greater than 1. I've not been able to show how this could be greater than one decimal point, however I don't have a proof that it contained within 1 decimal point. During all simulations however we have never seen greater than 1 decimal point increase, only here here we are seeing a 1 decimal point increase from currentStakeRoundUp
which holds the same value as currentStake
(see the reference calculation #4389 (comment))
If you have any specific alternative suggestions I'm open to them, however I honestly don't see a viable alternative given the current distribution mechanism design.
edit: we could potentially somehow temporarily increase the number of decimal places used within this calculation of calculateDelegationRewards
however it would require significant modifications to decimal, I'm not prepared to take this on atm, however it would probably do the trick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened up an issue #4399 - I think that's the long term solution, however I don't have short-term capacity for that, and this is still a bug, so I suggest we merge this as is, until performing the changes to decimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are always rounding in the same direction here.
I don't mean in the sanity check, I mean in all other instances contributing to stake
where due to the inconsistencies in calculating stake across multiple periods it would round up slightly greater
It's these "inconsistencies" that I'm concerned about - shouldn't it be possible to ensure that they are all "inconsistent" in the same (safe) direction?
If you have any specific alternative suggestions I'm open to them, however I honestly don't see a viable alternative given the current distribution mechanism design.
Yes, in principle I don't see how we can't ensure that all operations round in a particular, predetermined direction if in fact they will lose precision. Is there a counterexample of that?
My guess is that we ought to be rounding in some direction somewhere where we aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that we ought to be rounding in some direction somewhere where we aren't.
I don't mean in the sanity check, I mean in all other instances contributing to stake
Well without guessing, aka by reviewing the code for the Nth time, I can say with strong confidence that this guess is incorrect, in all calculations which contribute to forming the value held in the stake
object, we round in a direction which reduces the value of stake. Here let me baby you through the important steps involved:
// event.Fraction comes from updatedFraction
// increasing event.Fraction decreases the multiplier, decreasing stake
stake = stake.MulTruncate(sdk.OneDec().Sub(event.Fraction))
// decreasing currentMultiplicand increases updatedFraction which decreases stake
// decreasing newMultiplicand increases updatedFraction which decreases stake
// the use of MulTruncate here, increases updatedFraction which decreases stake
updatedFraction := sdk.OneDec().Sub(currentMultiplicand.MulTruncate(newMultiplicand))
// current fraction comes from updatedFraction (recursive)
currentMultiplicand := sdk.OneDec().Sub(currentFraction)
// fraction comes from effectiveFraction,
// increasing fraction causes decreasing newMultiplicand increases updatedFraction which decreases stake
newMultiplicand := sdk.OneDec().Sub(fraction)
// here the use of QuoRoundUp increases effectiveFraction which (see above) desceases stake
effectiveFraction := tokensToBurn.ToDec().QuoRoundUp(validator.Tokens.ToDec()) // here tokensToBurn are actually applied to the tokens within staking
https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/distribution/keeper/delegation.go#L87 https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/distribution/keeper/validator.go#L111 https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/staking/keeper/slash.go#L109
Yes, in principle I don't see how we can't ensure that all operations round in a particular, predetermined direction if in fact they will lose precision. Is there a counterexample of that?
The counterexample is not of our concern as it reduces the value of stake which is a situation we ignore. However based on how the code is structured (see the code snippets ^ ) it's very clear that stake will very often be < currentStake based on our rounding direction.
It's these "inconsistencies" that I'm concerned about - shouldn't it be possible to ensure that they are all "inconsistent" in the same (safe) direction?
If we increase the decimal precision for use in this calculation we can ensure that the inconsistencies are tolerable and we can remove this special correction factor, aka:
if stake.Equal(currentStake.Add(sdk.SmallestDec())) {
stake = currentStake
can be removed. See #4399 for more thoughts on that
tbh I think I'm just about through going over this code/explanations, I'm very confident this solution makes sense - Of course, I invite you to implement an alternative solution if you have strong justifications (or implement #4399, which I'll be taking on within the next few releases or so probably either way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rigelrozanski - sorry if I come across as obstinate, there's just a lot of room for error in multi-stage rounding (including possible bugs in code I previously wrote).
Is it possible to change the case which causes stake.GT(currentStake)
- https://github.com/cosmos/cosmos-sdk/pull/4389/files#diff-de71192adba700b3255b1eab62bef50eR100 - to sometimes be true? If "all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake", then we should never hit this conditional, unless currentStake
was also rounded down, right?
The old comment discusses stake being multiplied by slash fractions - https://github.com/cosmos/cosmos-sdk/pull/4389/files#diff-de71192adba700b3255b1eab62bef50eL101 - is that referring to "currentStake" being multiplied by a slash fraction, rounded down, and ending up less than stake
? Have we considered instead rounding up when slashes happen (but continuing to round down when accounting in F1) - if that's safe, it might save us this check.
My current understanding of this discrepancy is that its probabilistic - based on specific uses of validator periods the rounding errors can compound to ultimately increase the lowest decimal place by greater than 1. I've not been able to show how this could be greater than one decimal point, however I don't have a proof that it contained within 1 decimal point. During all simulations however we have never seen greater than 1 decimal point increase, only here here we are seeing a 1 decimal point increase from currentStakeRoundUp which holds the same value as currentStake.
This is my specific concern. I guess this concern doesn't need to block this PR, since whatever such bug, if present, will have previously existed regardless - this PR at minimum reduces the likelihood, I agree about that. My line of questioning is as to whether we can completely eliminate the possibility of a discrepancy here.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries on being thorough - I guess I was just looking for more thoughtful criticism's off the bat, but I do now understand what you're referring too.
Is it possible to change the case which causes stake.GT(currentStake) - to sometimes be true? If "all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake", then we should never hit this conditional, unless currentStake was also rounded down, right?
So stake.GT(currentStake)
will often be true, when the calculations are working as expected. We hit this conditional in a non-erroring (expected) way now because we are kind of "splitting up" the whole act of slashing stake within the distribution mechanism by validator periods. Without ever rounding in a direction which would increase this stake, just the fact that we've split this up means that where the slashing is not split up per delegation as it is in distribution, the combined slashing can slash more which is inherent to slashing for all all delegations simultaneously instead of by period. When we break out slashing by period, even if we round down for each slash fraction, it's possible due to how much is being rounded that we slash less when slashing by period instead of for when we slash without periods. In other words, The full slash, and the slashing by period could both be rounding down but the slashing by period is simply rounding down less.
I don't know how else I can explain this short of creating a numerical example (which you should try for yourself!)... But I get it, like this is a really bizarre occurrence.
is that referring to "currentStake" being multiplied by a slash fraction, rounded down, and ending up less than stake? Have we considered instead rounding up when slashes happen (but continuing to round down when accounting in F1) - if that's safe, it might save us this check.
This wouldn't affect this scenario. For the same reasons as explained above, this bug doesn't have to do so much with the direction of rounding (which are all setup to avoid this panic condition) but with the magnitude of rounding which is changed because we are splitting up the slashing by periods.
This is my specific concern. I guess this concern doesn't need to block this PR, since whatever such bug, if present, will have previously existed regardless - this PR at minimum reduces the likelihood, I agree about that. My line of questioning is as to whether we can completely eliminate the possibility of a discrepancy here.
Does that make sense?
I understand your thinking, but for the bizarre reasons I've explained, until we increase the precision for this calculation, we're going to have to live with this solution AFAICT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we break out slashing by period, even if we round down for each slash fraction, it's possible due to how much is being rounded that we slash less when slashing by period instead of for when we slash without periods. In other words, The full slash, and the slashing by period could both be rounding down but the slashing by period is simply rounding down less.
Ah now I see, thanks - yes, that would be possible, makes sense.
Can you copy your above explanation into the comment (it should have been there in the first place, which is probably on me)?
I understand your thinking, but for the bizarre reasons I've explained, until we increase the precision for this calculation, we're going to have to live with this solution AFAICT
Right, better precision would possibly enable a tighter bound. I wonder if we should actually increase the bound (out of which we panic) given the possible accumulated rounding errors and the unlikelihood of this being exploitable (since slashes, which are economically costly, are required), but if this never fails in simulation it does seem unlikely to happen in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this conversation has brought further clarity - I will improve the comment to incorporate thoughts from our conversation.
Right, better precision would possibly enable a tighter bound.
That's the thinking, the hope is that by the end of the calculation, when the high precision decimal is being reduced back to default decimal places, the in bound is eliminated.
I wonder if we should actually increase the bound (out of which we panic) given the possible accumulated rounding errors and the unlikelihood of this being exploitable (since slashes, which are economically costly, are required)
I would be comfortable increasing it up to 5 smallest points, this should entirely eliminate the possibility of this error. Maybe we include 3 right now?
* add trouble seed * currentStakeRoundUp is now always atleast currentStake + smallest-decimal-precision * remove unused code * remove debugs * @alexanderbez comment * compile fix * better comment, increase tolerance to 3 smallest decimal points
closes #4383
onOperation
not a global var but a flag--SimulateEveryOperation
test command:
go test ./simapp/... -run TestFullAppSimulation -SimulationEnabled=true -SimulationNumBlocks=500 -SimulationGenesis= -SimulationVerbose=true -SimulationCommit=true -SimulationSeed=8090485 -SimulationPeriod=50 -v -timeout 24h
docs/
)clog add [section] [stanza] [message]
Files changed
in the github PR explorerFor Admin Use: