Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.*: change pebble compaction concurrency w/o restart #85759

Merged
merged 1 commit into from
Aug 15, 2022
Merged

.*: change pebble compaction concurrency w/o restart #85759

merged 1 commit into from
Aug 15, 2022

Conversation

bananabrick
Copy link
Contributor

@bananabrick bananabrick commented Aug 8, 2022

During support issues with inverted lsms, we want the ability to easily
change compaction concurrency to de-invert the lsm. This pr provides
a simple sql command to temporarily change the compaction concurrency
of a store. The compaction concurrency is changed until the sql command
is cancelled.

Release Note: None
Release justification: Useful ops tool.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bananabrick bananabrick requested review from jbowens and itsbilal August 9, 2022 18:15
@bananabrick bananabrick marked this pull request as ready for review August 9, 2022 18:25
@bananabrick bananabrick requested review from a team as code owners August 9, 2022 18:25
@bananabrick bananabrick requested a review from a team August 9, 2022 18:25
@bananabrick
Copy link
Contributor Author

I have a little end to end testing to do, but it should be good for review.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Just a comment about where we do the blocking.

Reviewed 11 of 14 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @itsbilal)


pkg/kv/kvserver/compaction_concurrency_client.go line 1 at r1 (raw file):

// Copyright 2021 The Cockroach Authors.

nit: 2022


pkg/kv/kvserver/stores_server.go line 178 at r1 (raw file):

	err := is.execStoreCommand(ctx, req.StoreRequestHeader,
		func(ctx context.Context, s *Store) error {
			prevConcurrency := s.Engine().SetCompactionConcurrency(req.CompactionConcurrency)

nit: can we block on ctx.Done() here, in the server instead?

Right now we have two roundtrips:

[ SQL Gateway ] -> CompactionConcurrencyRequest(new)  -> [ Node w/ relevant store ]
[ SQL Gateway ] <- CompactionConcurrencyResponse(old) <- [ Node w/ relevant store ]
[ SQL gateway ] wait on ctx
[ SQL Gateway ] -> CompactionConcurrencyRequest(old)  -> [ Node w/ relevant store ]
[ SQL Gateway ] <- CompactionConcurrencyResponse(new) <- [ Node w/ relevant store ]

If the SQL gateway node crashes or otherwise exits before issuing the reset command, the store will continue to use the temporary compaction concurrency.

If instead, we do a single roundtrip and do the blocking within the recipient node, a crash on the SQL gateway node will cause the store to correctly revert to the original compaction concurrency. If the recipient node crashes, it's also okay because it'll boot back up with its original compaction concurrency.

[ SQL Gateway ] -> CompactionConcurrencyRequest(new)  -> [ Node w/ relevant store ]
                                             wait on ctx [ Node w/ relevant store ]
[ SQL Gateway ] <- CompactionConcurrencyResponse()    <- [ Node w/ relevant store ]

pkg/storage/pebble.go line 713 at r1 (raw file):

func (p *Pebble) SetCompactionConcurrency(n uint64) uint64 {
	prevConcurrency := atomic.LoadUint64(&p.atomic.compactionConcurrency)
	atomic.StoreUint64(&p.atomic.compactionConcurrency, n)

This can be single atomic operation using atomic.SwapUint64.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bananabrick and @itsbilal)


pkg/server/server_sql.go line 592 at r1 (raw file):

			ctx context.Context, nodeID, storeID int32, compactionConcurrency uint64,
		) (uint64, error) {
			return 0, errorutil.UnsupportedWithMultiTenancy(errorutil.FeatureNotAvailableToNonSystemTenantsIssue)

Why is this restricted to not-work for tenants? The RPC can be added to the kv conntector, can't it?

If this is hard to fix here, please

  1. file an issue (and add it to the KV and multitenant project boards)
  2. notify @ajstorm
  3. leave a TODO here in a comment with a reference to the new issue

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)


pkg/server/server_sql.go line 592 at r1 (raw file):

Previously, knz (kena) wrote…

Why is this restricted to not-work for tenants? The RPC can be added to the kv conntector, can't it?

If this is hard to fix here, please

  1. file an issue (and add it to the KV and multitenant project boards)
  2. notify @ajstorm
  3. leave a TODO here in a comment with a reference to the new issue

what's the expectation for functionality that performs host-wide changes? I would've assumed that something like changing a store's compaction concurrency would be prevented from within the context of a tenant? or do we allow tenants to manipulate the host cluster so long as they have appropriate authorization?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)


pkg/server/server_sql.go line 592 at r1 (raw file):

do we allow tenants to manipulate the host cluster so long as they have appropriate authorization?

This is the model we are going to evolve towards, yes. RFC upcoming.

Copy link
Contributor Author

@bananabrick bananabrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, @itsbilal, and @jbowens)


pkg/server/server_sql.go line 592 at r1 (raw file):

Previously, knz (kena) wrote…

do we allow tenants to manipulate the host cluster so long as they have appropriate authorization?

This is the model we are going to evolve towards, yes. RFC upcoming.

@knz Any examples of RPC being added to kv connector. I'm not too familiar with multitenant.


pkg/storage/pebble.go line 713 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

This can be single atomic operation using atomic.SwapUint64.

Done.

@bananabrick bananabrick requested review from jbowens and knz August 10, 2022 21:50
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r2, 10 of 12 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, @itsbilal, @jbowens, and @knz)


pkg/kv/kvserver/compaction_concurrency_client.go line 24 at r3 (raw file):

// CompactionConcurrencyClient is used to temporarily modify the
// compaction concurrency of a client until the request is cancelled.
type CompactionConcurrencyClient struct {

I'm thinking maybe we should combine this client with the CompactEngineSpanClient as a single StorageEngineClient.


pkg/server/server_sql.go line 592 at r1 (raw file):

Previously, bananabrick (Arjun Nair) wrote…

@knz Any examples of RPC being added to kv connector. I'm not too familiar with multitenant.

it might be sufficient to unconditionally do the else branch, eg

	compactionConcurrencyClient := kvserver.NewCompactionConcurrencyClient(cfg.nodeDialer)

and down below

		// ...
		CompactEngineSpanFunc:     compactEngineSpanFunc,
		CompactionConcurrencyFunc: compactionConcurrencyClient.SetCompactionConcurrency,
		TraceCollector:            traceCollector,
		// ...

pkg/storage/pebble.go line 731 at r3 (raw file):

// SetCompactionConcurrency will return the previous compaction concurrency.
// SetCompactionConcurrency is not safe for concurrent use.

I think this comment could use some clarification. This method is safe for concurrent use, although the server RPC endpoint (which is defined elsewhere) that uses this method has the potential to reset the concurrency to the wrong value if used concurrently.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 14 files at r1, 1 of 7 files at r2, 12 of 12 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)


pkg/server/server_sql.go line 592 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

it might be sufficient to unconditionally do the else branch, eg

	compactionConcurrencyClient := kvserver.NewCompactionConcurrencyClient(cfg.nodeDialer)

and down below

		// ...
		CompactEngineSpanFunc:     compactEngineSpanFunc,
		CompactionConcurrencyFunc: compactionConcurrencyClient.SetCompactionConcurrency,
		TraceCollector:            traceCollector,
		// ...

that sounds right

@knz
Copy link
Contributor

knz commented Aug 11, 2022

In case you were curious: #85954

@bananabrick bananabrick requested a review from jbowens August 11, 2022 18:36
Copy link
Contributor Author

@bananabrick bananabrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm, @itsbilal, @jbowens, and @knz)


pkg/server/server_sql.go line 592 at r1 (raw file):

Previously, knz (kena) wrote…

that sounds right

Done.


pkg/storage/pebble.go line 731 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think this comment could use some clarification. This method is safe for concurrent use, although the server RPC endpoint (which is defined elsewhere) that uses this method has the potential to reset the concurrency to the wrong value if used concurrently.

Done.


pkg/kv/kvserver/compaction_concurrency_client.go line 24 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm thinking maybe we should combine this client with the CompactEngineSpanClient as a single StorageEngineClient.

Done.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Looks great!

Reviewed 1 of 12 files at r3, 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm and @itsbilal)

Copy link
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)


-- commits line 2 at r5:
It would be nice to have more in the git commit message for non-trivial changes. Our engineering standards document has more on this topic (seee "musts").

Copy link
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! :lgtm:

Reviewed 1 of 14 files at r1, 8 of 12 files at r3, 4 of 6 files at r4, 3 of 3 files at r5.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @ajstorm, @bananabrick, and @itsbilal)


pkg/sql/sem/builtins/builtins.go line 6282 at r5 (raw file):

					return nil, errors.AssertionFailedf("compaction_concurrency must be > 0")
				}
				//var prevConcurrency uint64

nit: dead code?

@bananabrick
Copy link
Contributor Author

tftrs!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 12, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 13, 2022

Merge conflict.

During support issues with inverted lsms, we want the ability to easily
change compaction concurrency to de-invert the lsm. This pr provides
a simple sql command to temporarily change the compaction concurrency
of a store. The compaction concurrency is changed until the sql command
is cancelled.

Release note: None
Release justification: Useful ops tool.
@bananabrick
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 15, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 15, 2022

Build succeeded:

@craig craig bot merged commit 55e0f99 into cockroachdb:master Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants