Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk.Context): Context.KVStore/TransientStore improve performance #14203

Closed

Conversation

testinginprod
Copy link
Contributor

@testinginprod testinginprod commented Dec 7, 2022

Description

partially addresses: #14202

new execution time:

goos: darwin
goarch: arm64
pkg: github.com/cosmos/cosmos-sdk/types
BenchmarkContext_KVStore
BenchmarkContext_KVStore-10    	23476627	        51.42 ns/op
PASS

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@testinginprod testinginprod requested a review from a team as a code owner December 7, 2022 15:32
@julienrbrt julienrbrt self-assigned this Dec 7, 2022
@testinginprod
Copy link
Contributor Author

There's also something else, for instance applying a pointer receiver on func (c *Context) KVStore further cuts down execution times to 84ns in my machine. I'm not sure if this change might cause some API breakages.

@alexanderbez alexanderbez added the backport/v0.47.x PR scheduled for inclusion in the v0.47's next stable release label Dec 7, 2022
@alexanderbez alexanderbez enabled auto-merge (squash) December 7, 2022 16:05
@aaronc
Copy link
Member

aaronc commented Dec 7, 2022

Can we switch to pointer receivers here too @testinginprod ?

@julienrbrt julienrbrt disabled auto-merge December 7, 2022 16:38
@testinginprod testinginprod changed the title feat(sdk.Context): Context.KVStore uses local ms field instead of calling MultiStore feat(sdk.Context): Context.KVStore uses pointer receiver and local ms field instead of calling MultiStore Dec 7, 2022
@testinginprod
Copy link
Contributor Author

Updated to use pointer receiver in KVStore and TransientStore. NOTE: calling Context{}.KVStore(..) is implicitly expanded to (&Context{}).KVStore(..) as mentioned here: https://github.com/golang/go/wiki/MethodSets#variables.

@testinginprod testinginprod changed the title feat(sdk.Context): Context.KVStore uses pointer receiver and local ms field instead of calling MultiStore feat(sdk.Context): Context.KVStore/TransientStore uses pointer receiver; and local ms field instead of calling MultiStore Dec 7, 2022
@testinginprod
Copy link
Contributor Author

@aaronc @alexanderbez took execution time down to 64ns by using gasMeter field instead of GasMeter method.

@testinginprod testinginprod changed the title feat(sdk.Context): Context.KVStore/TransientStore uses pointer receiver; and local ms field instead of calling MultiStore feat(sdk.Context): Context.KVStore/TransientStore improve performance Dec 7, 2022
Copy link
Member

@aaronc aaronc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a breaking CHANGELOG entry. My preference would be to just convert to a pointer value internally within context and leave the public API unchanged

Comment on lines 89 to 90
gasConfig := storetypes.KVGasConfig()
return gaskv.NewStore(store, g.GasMeter, gasConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this offer any performance improvement?

Copy link
Contributor Author

@testinginprod testinginprod Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the perf improvement is this: https://github.com/cosmos/cosmos-sdk/pull/14203/files#diff-abad3e7f81deb0a089f7e974e60a081e67a7b21e5c6ba5c7b2be3785a66445afR15

this allows us to save time because of alloc + copy, gasKV config is static for a specific context, meaning that as the context is running the gas configuration (how much we spend on each Read/Write/Iter/etc) does not change, so this is why the copy is unnecessary. To further support this point, the only way to modify the GasConfig is by calling WithGasConfig in Context which yields to a Context copy which means the behaviour before my changes is that: changes to GasConfig are not propagated to existing gaskv instances. My perf improvement retains this behaviour.

My benches showed it saved a further 10ns

Copy link

@peterbourgon peterbourgon Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this allows us to save time because of alloc + copy

Cool, where can we see the benchmark results that demonstrate the perf increases? It would be super cool if changing the representation of this ~50 byte struct from a value to a pointer managed to produce performance benefits from the reduced copy costs that outweighed the additional costs it incurs on the allocator and GC.

@github-actions github-actions bot removed the C:x/group label Dec 8, 2022
@peterbourgon
Copy link

Is the claim here that changing the gasConfig field of the KVStore struct from a value to a pointer represents a meaningful improvement in performance? If so, impressive! Can we see the benchstat deltas?

Comment on lines -280 to 284
func (c Context) KVStore(key storetypes.StoreKey) KVStore {
return gaskv.NewStore(c.MultiStore().GetKVStore(key), c.GasMeter(), c.kvGasConfig)
// NOTE: Uses pointer receiver to save on execution time.
func (c *Context) KVStore(key storetypes.StoreKey) KVStore {
kv := c.ms.GetKVStore(key)
return gaskv.NewStore(kv, c.gasMeter, c.kvGasConfig)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callers who invoke c.KVStore will get a KVStore value with a kvGasConfig that is no longer a copy of the config data, but instead is a pointer to the same storetypes.GasConfig value of the Context from which the KVStore was derived. This is a fundamental change to the semantics of the KVStore method, and of the Context API in general. It's important to establish the safety of large changes like this with comprehensive test coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if we need testing for this but old behaviour is retained

@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 8, 2022

@peterbourgon yes sir...

% benchstat 9bf027460bae724bbc1fddd3541248b7fe18ab2c.txt 8a9565fd0aca7909851fc90053e2cb8e25583761.txt
name                old time/op    new time/op    delta
Context_KVStore-10     192ns ± 1%      50ns ± 0%  -73.77%  (p=0.000 n=17+17)

name                old alloc/op   new alloc/op   delta
Context_KVStore-10     96.0B ± 0%     48.0B ± 0%  -50.00%  (p=0.000 n=20+20)

name                old allocs/op  new allocs/op  delta
Context_KVStore-10      1.00 ± 0%      1.00 ± 0%     ~     (all equal)

The two hashes refer to the commits former should be main latter should the last commit of this branch, respective benches:
main: (note actually the initial bench was much lower 150ns as mentioned in the issue, ran multiple times but it's 190ns rn...)

goos: darwin
goarch: arm64
pkg: github.com/cosmos/cosmos-sdk/types/bench
BenchmarkContext_KVStore-10    	 6275276	       189.7 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6307548	       190.6 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6283483	       191.7 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6259264	       191.3 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6257390	       191.5 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6279206	       191.7 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6243421	       191.4 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6250104	       191.5 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6258055	       193.0 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6301917	       192.6 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6252657	       192.8 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6187279	       192.1 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6263904	       192.6 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6172588	       245.2 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 5175621	       201.4 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6196754	       197.1 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6000902	       193.0 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6173347	       192.8 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6232633	       192.7 ns/op	      96 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	 6203295	       192.7 ns/op	      96 B/op	       1 allocs/op
PASS
ok  	github.com/cosmos/cosmos-sdk/types/bench	28.327s

current branch:

goos: darwin
goarch: arm64
pkg: github.com/cosmos/cosmos-sdk/types/bench
BenchmarkContext_KVStore-10    	22164160	        50.18 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23854467	        50.31 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23845243	        50.32 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23878081	        50.44 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23988165	        50.35 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23913512	        50.45 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23561230	        50.28 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23838060	        50.85 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23824905	        50.39 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23574750	        50.39 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23577394	        50.39 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23634318	        50.67 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23897776	        50.34 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23702649	        50.36 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	24019234	        50.37 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23486774	        50.54 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23950899	        50.39 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23440113	        50.29 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23687599	        51.03 ns/op	      48 B/op	       1 allocs/op
BenchmarkContext_KVStore-10    	23768612	        50.44 ns/op	      48 B/op	       1 allocs/op
PASS
ok  	github.com/cosmos/cosmos-sdk/types/bench	25.251s

@peterbourgon
Copy link

peterbourgon commented Dec 8, 2022

Neat! It seems like this benchmark is measuring the overhead of calling methods on value receivers vs. pointer receivers. It's definitely true that types of size larger than a pointer will copy more data, and burn more CPU, in the former case vs. the latter. Looks like here it's worth about 100ns per call to KVStore. But is that meaningful? What calls KVStore, and how often, and what does that call represent in terms of the high-level operation(s) which invoke it? Does subtracting 100ns from this specific call improve system performance in a measurable way? The PR modifies the semantics of the Context API, and introduces potential concurrency issues — are those modifications tested and ensured to be safe?

@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 8, 2022

are those modifications tested and ensured to be safe

This is a fair point, and I will add behavioural tests. But yeah the modifications made are safe.
The only cost is the fact that interfaces abstracting sdk.Context using KVStore and TransientStore methods will be broken (nothing was broken because of this API change in the SDK).

Then with regards to performance benefit, I think the change is trivial, easy to understand and easy to interpret the consequences (none, besides what mentioned above).

Now this is a starting point, I think; sdk.Context is passed everywhere as a value, which incurs in copy costs, the struct is too big to be passed around like this, and sdk.Context is core to every module, for instance:

  • We use to emit events (have u checked how many events we create for one single TX? I did, and it's a LOOOT), every event emission is ctx.EventManager()....
  • We pass it in our chain of ante handlers, our ante handlers can pass it to 1 to 3-4 keepers (all ctx copies ofc, even if they shouldn't) then each of those keepers will call at least once KVStore.
  • And TX execution hasn't even officially started.
  • Then we start executing txs, we get again our context, pass it to keepers all around, call KVStore again... etc etc etc.
  • Then we have post handlers...

Besides txs,

  • there's begin block, everything involving keepers, and ctx being passed all around and functions calling methods on it...
  • there's endblock, same as above.

I cannot provide you the exact amount of times ctx is passed around and the exact number of times methods are being called on it... But my biggest guess is: it's the second most busy type in the SDK, close to KVStore.

@peterbourgon
Copy link

peterbourgon commented Dec 8, 2022

I agree that the current implementation, which passes sdk.Context around as a value, is weird and bad. But you can't just assert that it's "too big to be passed around like this" as some presumptive claim 😉 First and foremost, because the entire Cosmos ecosystem basically assumes that Context methods are defined on value copies and can't mutate their receiver, and so changing any method to a pointer receiver will have enormous, potentially state- and consensus-breaking, impacts which are absolutely not captured by tests. But also, because the performance problems in the SDK — which are important and serious and worth solving! — are absolutely not influenced by changes at this level. Making an ABCI query traverses, last I checked, almost 36 layers of abstraction, and locks at least 3 exclusive mutexes. Reducing copy costs at the level of tens- or hundreds-of-nanoseconds is an exercise in code golfing with no measurable impact on the bottom line.

@testinginprod
Copy link
Contributor Author

testinginprod commented Dec 8, 2022

Making an ABCI query traverses, last I checked, almost 36 layers of abstraction, and locks at least 3 exclusive mutexes. Reducing copy costs at the level of tens- or hundreds-of-nanoseconds is an exercise in code golfing with no measurable impact on the bottom line.

Terrific! Where are the benchmarks, flamegraphs, or any profiling data? I was asked to provide mine and I did, now it's your turn to provide yours 😉 .

First and foremost, because the entire Cosmos ecosystem basically assumes that Context methods are defined on value copies and can't mutate their receiver.

If you can make a [concrete] example for which the changes proposed can in any way mutate, or break some behaviour I'd be happy to retract my PR.

@yihuang
Copy link
Collaborator

yihuang commented Dec 8, 2022

First and foremost, because the entire Cosmos ecosystem basically assumes that Context methods are defined on value copies and can't mutate their receiver.

If you can make a [concrete] example for which the changes proposed can in any way mutate, or break some behaviour I'd be happy to retract my PR.

I agree with @peterbourgon that this is a significant behavior change, but I don't mean it should be blocked, but the community should be aware of the consequences.
We don't normally do concurrency in consensus state machine, but if there is, people should be careful.
grpc queries runs concurrently, so we should make sure the contexts there are copied for each connection.
In some old version of ethermint, we save the current context in the keeper to cope with the go-ethereum StateDB API, we don't do that anymore, otherwise it'll have problem with this change.

@aaronc aaronc self-assigned this Dec 8, 2022
@testinginprod
Copy link
Contributor Author

closing in favor of: #14354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/v0.47.x PR scheduled for inclusion in the v0.47's next stable release C:Store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants