-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.*: change pebble compaction concurrency w/o restart #85759
Conversation
I have a little end to end testing to do, but it should be good for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great. Just a comment about where we do the blocking.
Reviewed 11 of 14 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @itsbilal)
pkg/kv/kvserver/compaction_concurrency_client.go
line 1 at r1 (raw file):
// Copyright 2021 The Cockroach Authors.
nit: 2022
pkg/kv/kvserver/stores_server.go
line 178 at r1 (raw file):
err := is.execStoreCommand(ctx, req.StoreRequestHeader, func(ctx context.Context, s *Store) error { prevConcurrency := s.Engine().SetCompactionConcurrency(req.CompactionConcurrency)
nit: can we block on ctx.Done()
here, in the server instead?
Right now we have two roundtrips:
[ SQL Gateway ] -> CompactionConcurrencyRequest(new) -> [ Node w/ relevant store ]
[ SQL Gateway ] <- CompactionConcurrencyResponse(old) <- [ Node w/ relevant store ]
[ SQL gateway ] wait on ctx
[ SQL Gateway ] -> CompactionConcurrencyRequest(old) -> [ Node w/ relevant store ]
[ SQL Gateway ] <- CompactionConcurrencyResponse(new) <- [ Node w/ relevant store ]
If the SQL gateway node crashes or otherwise exits before issuing the reset command, the store will continue to use the temporary compaction concurrency.
If instead, we do a single roundtrip and do the blocking within the recipient node, a crash on the SQL gateway node will cause the store to correctly revert to the original compaction concurrency. If the recipient node crashes, it's also okay because it'll boot back up with its original compaction concurrency.
[ SQL Gateway ] -> CompactionConcurrencyRequest(new) -> [ Node w/ relevant store ]
wait on ctx [ Node w/ relevant store ]
[ SQL Gateway ] <- CompactionConcurrencyResponse() <- [ Node w/ relevant store ]
pkg/storage/pebble.go
line 713 at r1 (raw file):
func (p *Pebble) SetCompactionConcurrency(n uint64) uint64 { prevConcurrency := atomic.LoadUint64(&p.atomic.compactionConcurrency) atomic.StoreUint64(&p.atomic.compactionConcurrency, n)
This can be single atomic operation using atomic.SwapUint64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @itsbilal)
pkg/server/server_sql.go
line 592 at r1 (raw file):
ctx context.Context, nodeID, storeID int32, compactionConcurrency uint64, ) (uint64, error) { return 0, errorutil.UnsupportedWithMultiTenancy(errorutil.FeatureNotAvailableToNonSystemTenantsIssue)
Why is this restricted to not-work for tenants? The RPC can be added to the kv conntector, can't it?
If this is hard to fix here, please
- file an issue (and add it to the KV and multitenant project boards)
- notify @ajstorm
- leave a TODO here in a comment with a reference to the new issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)
pkg/server/server_sql.go
line 592 at r1 (raw file):
Previously, knz (kena) wrote…
Why is this restricted to not-work for tenants? The RPC can be added to the kv conntector, can't it?
If this is hard to fix here, please
- file an issue (and add it to the KV and multitenant project boards)
- notify @ajstorm
- leave a TODO here in a comment with a reference to the new issue
what's the expectation for functionality that performs host-wide changes? I would've assumed that something like changing a store's compaction concurrency would be prevented from within the context of a tenant? or do we allow tenants to manipulate the host cluster so long as they have appropriate authorization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)
pkg/server/server_sql.go
line 592 at r1 (raw file):
do we allow tenants to manipulate the host cluster so long as they have appropriate authorization?
This is the model we are going to evolve towards, yes. RFC upcoming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, @itsbilal, and @jbowens)
pkg/server/server_sql.go
line 592 at r1 (raw file):
Previously, knz (kena) wrote…
do we allow tenants to manipulate the host cluster so long as they have appropriate authorization?
This is the model we are going to evolve towards, yes. RFC upcoming.
@knz Any examples of RPC being added to kv connector. I'm not too familiar with multitenant.
pkg/storage/pebble.go
line 713 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
This can be single atomic operation using
atomic.SwapUint64
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r2, 10 of 12 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, @itsbilal, @jbowens, and @knz)
pkg/kv/kvserver/compaction_concurrency_client.go
line 24 at r3 (raw file):
// CompactionConcurrencyClient is used to temporarily modify the // compaction concurrency of a client until the request is cancelled. type CompactionConcurrencyClient struct {
I'm thinking maybe we should combine this client with the CompactEngineSpanClient
as a single StorageEngineClient
.
pkg/server/server_sql.go
line 592 at r1 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
@knz Any examples of RPC being added to kv connector. I'm not too familiar with multitenant.
it might be sufficient to unconditionally do the else branch, eg
compactionConcurrencyClient := kvserver.NewCompactionConcurrencyClient(cfg.nodeDialer)
and down below
// ...
CompactEngineSpanFunc: compactEngineSpanFunc,
CompactionConcurrencyFunc: compactionConcurrencyClient.SetCompactionConcurrency,
TraceCollector: traceCollector,
// ...
pkg/storage/pebble.go
line 731 at r3 (raw file):
// SetCompactionConcurrency will return the previous compaction concurrency. // SetCompactionConcurrency is not safe for concurrent use.
I think this comment could use some clarification. This method is safe for concurrent use, although the server RPC endpoint (which is defined elsewhere) that uses this method has the potential to reset the concurrency to the wrong value if used concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 14 files at r1, 1 of 7 files at r2, 12 of 12 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)
pkg/server/server_sql.go
line 592 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
it might be sufficient to unconditionally do the else branch, eg
compactionConcurrencyClient := kvserver.NewCompactionConcurrencyClient(cfg.nodeDialer)
and down below
// ... CompactEngineSpanFunc: compactEngineSpanFunc, CompactionConcurrencyFunc: compactionConcurrencyClient.SetCompactionConcurrency, TraceCollector: traceCollector, // ...
that sounds right
In case you were curious: #85954 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @itsbilal, @jbowens, and @knz)
pkg/server/server_sql.go
line 592 at r1 (raw file):
Previously, knz (kena) wrote…
that sounds right
Done.
pkg/storage/pebble.go
line 731 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think this comment could use some clarification. This method is safe for concurrent use, although the server RPC endpoint (which is defined elsewhere) that uses this method has the potential to reset the concurrency to the wrong value if used concurrently.
Done.
pkg/kv/kvserver/compaction_concurrency_client.go
line 24 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'm thinking maybe we should combine this client with the
CompactEngineSpanClient
as a singleStorageEngineClient
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Reviewed 1 of 12 files at r3, 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm and @itsbilal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)
-- commits
line 2 at r5:
It would be nice to have more in the git commit message for non-trivial changes. Our engineering standards document has more on this topic (seee "musts").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 14 files at r1, 8 of 12 files at r3, 4 of 6 files at r4, 3 of 3 files at r5.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)
pkg/sql/sem/builtins/builtins.go
line 6282 at r5 (raw file):
return nil, errors.AssertionFailedf("compaction_concurrency must be > 0") } //var prevConcurrency uint64
nit: dead code?
tftrs! bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Merge conflict. |
During support issues with inverted lsms, we want the ability to easily change compaction concurrency to de-invert the lsm. This pr provides a simple sql command to temporarily change the compaction concurrency of a store. The compaction concurrency is changed until the sql command is cancelled. Release note: None Release justification: Useful ops tool.
bors r+ |
Build failed (retrying...): |
Build succeeded: |
During support issues with inverted lsms, we want the ability to easily
change compaction concurrency to de-invert the lsm. This pr provides
a simple sql command to temporarily change the compaction concurrency
of a store. The compaction concurrency is changed until the sql command
is cancelled.
Release Note: None
Release justification: Useful ops tool.