-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: speed up crisis invariant check (juno genesis) #12886
perf: speed up crisis invariant check (juno genesis) #12886
Conversation
… a long time running sequentially for each validator
This is a substantially different approach taken from #12885 (which I don't really understand and fear is too complicated). I feel like running these calls in parallel should be OK. Can you please post benchmarks pre/post? |
Yeah, it's actually pretty cool to see how different the approaches were. I posted the bench of the after in the PR (the entire crisis check took about a minute). I gave up on running the before check after about an hour, but I'll let it run for a bit and update the PR later on when it's over |
@Cashmaney I meant native go benchmarks that should you the ns/ms delta where you just run the invariant function against some state, preferably Juno's :) |
@alexanderbez You mean like BenchmarkInvariants? The problem is taking a real-world state and creating such a complex context in a test. Is there an easy way to pull a genesis file to simapp and run a test on that? I went with a simpler approach - debug prints inside the function😅 Post:
Pre:
I also have the two binaries here - https://github.com/Cashmaney/cosmos-sdk/releases/tag/concurrent-invariant-check |
@Cashmaney yes, take the genesis state and load the file into memory and populate the distribution and staking keepers. Then you can run benchmarks off of that. |
thank you |
Couldn't figure out how to get the simapp working properly with the Juno state, so I just ran another real-world bench on gaia with the cosmoshub-4 genesis state: Post fix:
Pre fix:
|
hey @Cashmaney, nice solution! From my testing, the primary bottleneck in the original code is creating the iterator in This solution doesn't directly address that problem, but seems to have fixed it anyway. I checked how long those calls take in this PR and its well under 1ms. I don't know why the call behavior would change so drastically by wrapping parent calls in go routines. I wonder if anyone can shed some light on what's going on here. |
Hey. Thanks - it was really funny to see two radically different solutions posted basically at the same time. You seemed to have dived a bit deeper than I did. I'm not even sure how and where that function is called from, so I'll leave it to someone more knowledgeable in both the sdk and Go to weigh in on that |
@Cashmaney there's no need to create a |
go func(val stakingtypes.ValidatorI, wg *sync.WaitGroup) { | ||
defer wg.Done() | ||
|
||
_, _ = k.WithdrawValidatorCommission(ctx, val.GetOperator()) | ||
|
||
delegationAddrs, ok := valDelegationAddrs[val.GetOperator().String()] | ||
if ok { | ||
for _, delAddr := range delegationAddrs { | ||
if _, err := k.WithdrawDelegationRewards(ctx, delAddr, val.GetOperator()); err != nil { | ||
panic(err) | ||
} | ||
} | ||
} | ||
} | ||
|
||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remaining) > 0 && remaining[0].Amount.IsNegative() { | ||
return true | ||
} | ||
remainingRewards := k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remainingRewards) > 0 && remainingRewards[0].Amount.IsNegative() { | ||
broken = true | ||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
} | ||
}(val, wg) |
Check notice
Code scanning / CodeQL
Spawning a Go routine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marbar3778 is there an annotation we can add to comments/code to suppress such warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found one for codeql (the crypto.com version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am checking now though
I'm in favor of #12885 since it's a general case solution. @alexanderbez I don't think we should be afraid to improve old code, I made an analysis of the cache iterator and it seems OK to me. |
@kocubinski to be fair, @alexanderbez saw a much different version of the code that was merely a workaround for the actual problem. The assessment of it being too complicated was probably correct. |
Ah, I see there was a force push. This is my first look at the PR, nice simplification.
I'd like to uncover why we see these perf improvements here too, but in end I don't see why we can't merge both. |
@kocubinski Based on my testing, it seems the goroutines help the code find data that will shrink/clear the unsorted cache, allowing the rest of the processing to go much more quickly. How fast that happens depends on the number of cpu cores, and varies slightly from run to run. |
…2885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file #12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it.
…2885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file #12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it. (cherry picked from commit 4fc1f73) # Conflicts: # CHANGELOG.md
…2885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file #12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it. (cherry picked from commit 4fc1f73) # Conflicts: # CHANGELOG.md
@kocubinski @alexanderbez do we still want this pr as well? |
…ckport #12885) (#12961) * perf: Amortize clearing unsorted cache entries (Juno genesis fix) (#12885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file #12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it. (cherry picked from commit 4fc1f73) # Conflicts: # CHANGELOG.md * fix conflict Co-authored-by: blazeroni <[email protected]> Co-authored-by: Julien Robert <[email protected]> Co-authored-by: Marko <[email protected]>
@marbar3778 yeah I don't see why not actually. Let me give this one quick look again, and we can merge 👍 Any objections @kocubinski? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few minor nit comments
go func(val stakingtypes.ValidatorI, wg *sync.WaitGroup) { | ||
defer wg.Done() | ||
|
||
_, _ = k.WithdrawValidatorCommission(ctx, val.GetOperator()) | ||
|
||
delegationAddrs, ok := valDelegationAddrs[val.GetOperator().String()] | ||
if ok { | ||
for _, delAddr := range delegationAddrs { | ||
if _, err := k.WithdrawDelegationRewards(ctx, delAddr, val.GetOperator()); err != nil { | ||
panic(err) | ||
} | ||
} | ||
} | ||
} | ||
|
||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remaining) > 0 && remaining[0].Amount.IsNegative() { | ||
return true | ||
} | ||
remainingRewards := k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remainingRewards) > 0 && remainingRewards[0].Amount.IsNegative() { | ||
broken = true | ||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
} | ||
}(val, wg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marbar3778 is there an annotation we can add to comments/code to suppress such warnings?
Fixed:) |
Would like another approval prior to merge :) |
go func(val stakingtypes.ValidatorI, wg *sync.WaitGroup) { | ||
defer wg.Done() | ||
|
||
_, _ = k.WithdrawValidatorCommission(ctx, val.GetOperator()) | ||
|
||
delegationAddrs, ok := valDelegationAddrs[val.GetOperator().String()] | ||
if ok { | ||
for _, delAddr := range delegationAddrs { | ||
if _, err := k.WithdrawDelegationRewards(ctx, delAddr, val.GetOperator()); err != nil { | ||
panic(err) | ||
} | ||
} | ||
} | ||
} | ||
|
||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remaining) > 0 && remaining[0].Amount.IsNegative() { | ||
return true | ||
} | ||
remainingRewards := k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remainingRewards) > 0 && remainingRewards[0].Amount.IsNegative() { | ||
broken = true | ||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
} | ||
}(val, wg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found one for codeql (the crypto.com version)
go func(val stakingtypes.ValidatorI, wg *sync.WaitGroup) { | ||
defer wg.Done() | ||
|
||
_, _ = k.WithdrawValidatorCommission(ctx, val.GetOperator()) | ||
|
||
delegationAddrs, ok := valDelegationAddrs[val.GetOperator().String()] | ||
if ok { | ||
for _, delAddr := range delegationAddrs { | ||
if _, err := k.WithdrawDelegationRewards(ctx, delAddr, val.GetOperator()); err != nil { | ||
panic(err) | ||
} | ||
} | ||
} | ||
} | ||
|
||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remaining) > 0 && remaining[0].Amount.IsNegative() { | ||
return true | ||
} | ||
remainingRewards := k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
if len(remainingRewards) > 0 && remainingRewards[0].Amount.IsNegative() { | ||
broken = true | ||
remaining = k.GetValidatorOutstandingRewardsCoins(ctx, val.GetOperator()) | ||
} | ||
}(val, wg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am checking now though
Oh and teeny nit... I will soon post a PR that fumpts this :) |
@alexanderbez hey, I checked on codeql. Answer is no. |
Codecov Report
@@ Coverage Diff @@
## main #12886 +/- ##
==========================================
- Coverage 54.10% 53.16% -0.94%
==========================================
Files 662 641 -21
Lines 56426 54885 -1541
==========================================
- Hits 30528 29181 -1347
+ Misses 23515 23378 -137
+ Partials 2383 2326 -57 |
@alexanderbez are we merging this? |
…ckport #12885) (#12961) * perf: Amortize clearing unsorted cache entries (Juno genesis fix) (#12885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file cosmos/cosmos-sdk#12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it. (cherry picked from commit 4fc1f73) # Conflicts: # CHANGELOG.md * fix conflict Co-authored-by: blazeroni <[email protected]> Co-authored-by: Julien Robert <[email protected]> Co-authored-by: Marko <[email protected]>
We already merged #12885 -- we can close this PR. |
* fix: update x/mint parameter validation (backport #12384) (#12396) * chore: optimize get last completed upgrade (#12268) * feat: improve GetLastCompletedUpgrade * rename * use var block * fix: Simulation is not deterministic due to GenTx (backport #12374) (#12437) * fix: use go install instead of go get in makefile (#12435) * chore: fumpt sdk v45 series #12442 * feat: Move AppModule.BeginBlock and AppModule.EndBlock to extension interfaces (backport #12603) (#12638) * feat: Move AppModule.BeginBlock and AppModule.EndBlock to extension interfaces (#12603) ## Description Most modules right now have a no-op for AppModule.BeginBlock and AppModule.EndBlock. We should move these methods off the AppModule interface so we have less deadcode, and instead move them to extension interfaces. 1. I added `BeginBlockAppModule` and `EndBlockAppModule` interface. 2. Remove the dead-code from modules that do no implement them 3. Add type casting in the the module code to use the new interface Closes: #12462 * feat: add message index event attribute to authz message execution (backport #12668) (#12673) * chore(store): upgrade iavl to v0.19.0 (backport #12626) (#12697) * feat: Add `GetParamSetIfExists` to prevent panic on breaking param changes (#12724) * feat: Add convenience method for constructing key to access account's balance for a given denom (backport #12674) (#12745) * feat: Add convenience method for constructing key to access account's balance for a given denom (#12674) This PR adds a convenience method for constructing the key necessary to query for the account's balance of a given denom. I ran into this issue since we are using ABCI query now to perform balance requests because we are also requesting merkle proofs for the returned balance [here](https://github.com/celestiaorg/celestia-node/pull/911/files#diff-0ee31f5a7bd88e9f758e6bebdf3ee36365519e55a451098d9638c39afe5eac42R144). It would be nice to have a definitive convenience method for constructing the key. [Ref.](github.com/celestiaorg/celestia-node/pull/911) (cherry picked from commit a1777a8) # Conflicts: # CHANGELOG.md # x/bank/legacy/v043/store.go # x/bank/types/keys.go * Update CHANGELOG.md * fix conflict Co-authored-by: rene <[email protected]> Co-authored-by: Marko <[email protected]> * chore: bump tm in 0.45.x (#12784) * bump tendermint version * add changelog entry * replace on jhump * updates * updates * updates Co-authored-by: Aleksandr Bezobchuk <[email protected]> * chore: 0.45.7 changelog prep (#12821) * prepare for release * modify release notes * docs(staking): typo in staking/state (backport #12834) (#12836) * fix(docs): typo in staking/state (#12834) (cherry picked from commit fe89212) # Conflicts: # x/staking/spec/01_state.md * updates Co-authored-by: Ari Rubinstein <[email protected]> Co-authored-by: Julien Robert <[email protected]> * chore: fee payer event (backport #12850) (#12856) * chore: changelog update (backport #12859) (#12862) * fix: Use fixed length hex for pointer at FwdCapabilityKey (backport #11737) (#12818) * fix: Use fixed length hex for pointer at FwdCapabilityKey (backport #11737) * Update CHANGELOG.md * add comments and unit tests * feat: deterministic map iteration (backport #12781) (#12944) * feat: deterministic map iteration (#12781) ## Description We should ensure that events emitted are in a deterministic order on any node. Sorry, the previous [PR-12693](cosmos/cosmos-sdk#12693) was closed due to wrong operation. Close: #12693 * chore: bump tendermint to `0.34.21` and iavl to `0.19.1` (#12970) * perf: Amortize clearing unsorted cache entries (Juno genesis fix) (backport #12885) (#12961) * perf: Amortize clearing unsorted cache entries (Juno genesis fix) (#12885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file cosmos/cosmos-sdk#12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it. (cherry picked from commit 4fc1f73) # Conflicts: # CHANGELOG.md * fix conflict Co-authored-by: blazeroni <[email protected]> Co-authored-by: Julien Robert <[email protected]> Co-authored-by: Marko <[email protected]> * fix: proper error when parsing telemetry configuration (backport #12981) (#12999) * ci: fix release notes not populated by goreleaser (#13019) Co-authored-by: Julien Robert <[email protected]> * fix: missing return statement in BaseApp.Query (backport #13046) (#13050) * fix: missing return statement in BaseApp.Query (#13046) ## Description Closes: #13040 * chore: v0.45.8 release changelog (#13053) * chore: v0.45.8 release changelog * fix issue name Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Robert Zaremba <[email protected]> Co-authored-by: Adu <[email protected]> Co-authored-by: Julien Robert <[email protected]> Co-authored-by: Jacob Gadikian <[email protected]> Co-authored-by: Federico Kunze Küllmer <[email protected]> Co-authored-by: rene <[email protected]> Co-authored-by: Marko <[email protected]> Co-authored-by: Aleksandr Bezobchuk <[email protected]> Co-authored-by: Ari Rubinstein <[email protected]> Co-authored-by: yihuang <[email protected]> Co-authored-by: blazeroni <[email protected]>
…ckport #12885) (#12961) * perf: Amortize clearing unsorted cache entries (Juno genesis fix) (#12885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file cosmos/cosmos-sdk#12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it. (cherry picked from commit 4fc1f73) # Conflicts: # CHANGELOG.md * fix conflict Co-authored-by: blazeroni <[email protected]> Co-authored-by: Julien Robert <[email protected]> Co-authored-by: Marko <[email protected]>
…ckport cosmos#12885) (cosmos#12961) * perf: Amortize clearing unsorted cache entries (Juno genesis fix) (cosmos#12885) This change fixes a bounty by the Juno team. Juno's invariant checks took 10 hours during their most recent chain halt. This PR cuts that down to 30 seconds. See https://github.com/CosmosContracts/bounties#improve-speed-of-invariant-checks. The root problem is deep in the `can-withdraw` invariant check, which calls this repeatedly: https://github.com/cosmos/cosmos-sdk/blob/main/x/distribution/keeper/store.go#L337. Iterators have a chain of parents and in this case creates an iterator from the `cachekv` store. For the genesis file, it has a cache of 500,000+ unsorted entries, which are sorted as strings here: https://github.com/cosmos/cosmos-sdk/blob/main/store/cachekv/store.go#L314. Each delegation from `can-withdraw` uses this cache and many of the cache checks miss or are a very small range. This means very few entries get removed from the unsorted cache and they have to be re-sorted on the next call. With a full cache it takes about 180ms on my machine to sort them. This change introduce a minimum number of entries that will get processed and removed from the unsorted list. It's set at the same value that directs the code to sort them in the first place. This ensures the unsorted values get removed in a relative short amount of time, and amortizes the cost to ensure an individual check does not have to process the entire cache. ## Benchmarks On running the benchmarks included in this change produces: ```shell name old time/op new time/op delta LargeUnsortedMisses-32 21.2s ± 9% 0.0s ± 1% -99.91% (p=0.000 n=20+17) name old alloc/op new alloc/op delta LargeUnsortedMisses-32 1.64GB ± 0% 0.00GB ± 0% -99.83% (p=0.000 n=19+19) name old allocs/op new allocs/op delta LargeUnsortedMisses-32 20.0k ± 0% 41.1k ± 0% +105.23% (p=0.000 n=19+20) ``` ## Invariant checks results This is what the invariant checks for Juno look like with this change (on a Hetzner AX101): ```shell INF starting node with ABCI Tendermint in-process 4:11PM INF Starting multiAppConn service impl=multiAppConn module=proxy 4:11PM INF Starting localClient service connection=query impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=mempool impl=localClient module=abci-client 4:11PM INF Starting localClient service connection=consensus impl=localClient module=abci-client 4:11PM INF Starting EventBus service impl=EventBus module=events 4:11PM INF Starting PubSub service impl=PubSub module=pubsub 4:11PM INF Starting IndexerService service impl=IndexerService module=txindex 4:11PM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=v9.0.0-36-g8fd6f16 4:11PM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0 4:12PM INF asserting crisis invariants inv=1/11 module=x/crisis name=gov/module-account 4:12PM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/can-withdraw 4:12PM INF asserting crisis invariants inv=4/11 module=x/crisis name=distribution/reference-count 4:12PM INF asserting crisis invariants inv=5/11 module=x/crisis name=distribution/module-account 4:12PM INF asserting crisis invariants inv=6/11 module=x/crisis name=bank/nonnegative-outstanding 4:12PM INF asserting crisis invariants inv=7/11 module=x/crisis name=bank/total-supply 4:12PM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/module-accounts 4:12PM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/nonnegative-power 4:12PM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/positive-delegation 4:12PM INF asserting crisis invariants inv=11/11 module=x/crisis name=staking/delegator-shares 4:12PM INF asserted all invariants duration=28383.559601 height=4136532 module=x/crisis ``` ## Alternatives There is another PR which fixes this problem for the Juno genesis file cosmos#12886. However, because of its concurrent nature, it happens to hit a large range relatively early, clearing the unsorted entries and allowing the rest of the checks to not sort it. (cherry picked from commit 4fc1f73) # Conflicts: # CHANGELOG.md * fix conflict Co-authored-by: blazeroni <[email protected]> Co-authored-by: Julien Robert <[email protected]> Co-authored-by: Marko <[email protected]>
Description
Looking at the time it takes to handle the crisis invariant check when starting a node, the vast majority of time taken by the invariant check is due to the
can-withdraw
function, which takes ages because it sequentially loops every validator and every delegator. I switched the looping to be concurrent instead, resulting in a pretty huge speedup.This check isn't part of consensus or anything, so non-deterministic concurrency should be fair game here seeing as how rewards are independent between validators.
I'm not super familiar with the internals of x/crisis, but it doesn't seem to me that I broke anything in doing this - behaviour may differ since the code will panic on the first invariant error it encounters (which may be different between executions if there are multiple), but I feel this is a compromise that we should be willing to make. Still, I might be missing something here, so feel free to correct me if that isn't the case.
When testing using Pheonix-2 genesis file
On my 24-core AMD Ryzen 9 3900x:
Before:
After:
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change