Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: eagerly carry out chunked GC #24209

Merged
merged 2 commits into from
Apr 5, 2018
Merged

Conversation

tbg
Copy link
Member

@tbg tbg commented Mar 26, 2018

The GC queue was already splitting up the set of keys to be deleted into
small parts to avoid overloading the Raft machinery. From there, it was
only a small step to let it dispatch the corresponding GCRequests eagerly.

The upshot is that now we don't have to hold all of the keys to be deleted
in memory any more.

With appropriate testing, I think this is a change suitable for inclusion
in 2.0.1. "Appropriate testing" includes a roachtest which creates a large
range (larger than available RAM) and fills it with GC'able data. (A
variant of the DROP test that increases the range size suitably should do
it).

Release note (performance improvement): lowered the amount of memory used
during garbage collection of old versions.

@tbg tbg requested a review from a team as a code owner March 26, 2018 16:57
@tbg tbg requested a review from a team March 26, 2018 16:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bdarnell
Copy link
Contributor

creates a large range (larger than available RAM)

Shouldn't that be disallowed by the backpressure introduced by #21777?

This :lgtm: for 2.1 because it is a step towards increasing range size limits, but before we consider this for 2.0.1 we should do more investigation to figure out how we got such an oversized range to begin with.


Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/storage/gc_queue.go, line 517 at r2 (raw file):

}

// NoopGCer implements storage.GCer by doing nothing.

Nit: we're in package storage so no need to repeat it in the doc comments.


pkg/storage/gc_queue.go, line 550 at r2 (raw file):

	var ba roachpb.BatchRequest

	// Technically not needed since we're talking directly to the Range.

s/Range/Replica/


pkg/storage/gc_queue.go, line 568 at r2 (raw file):

	req.Threshold = thresh.Key
	req.TxnSpanGCThreshold = thresh.Txn
	return r.send(ctx, req)

Comment that we specifically do not batch SetGCThreshold with keys to GC to minimize contention in the command queue.


Comments from Reviewable

@bdarnell
Copy link
Contributor

And approved for 2.0.1 following discussion in #24215

@nvanbenschoten
Copy link
Member

Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


pkg/storage/gc_queue.go, line 538 at r2 (raw file):

	desc := r.repl.Desc()
	var template roachpb.GCRequest
	template.Key = desc.StartKey.AsRawKey()

Consider adding a TODO to constrain this header to only what's actually in the GCRequest. Now that we're chunking these requests, we can be a lot more CommandQueue friendly.


pkg/storage/gc_queue.go, line 555 at r2 (raw file):

	var union roachpb.RequestUnion
	union.MustSetInner(&req)
	ba.Requests = append(ba.Requests, union)

ba.Add?


pkg/storage/gc_queue.go, line 568 at r2 (raw file):

minimize contention in the command queue

How does this minimize contention? Just because SetGCThreshold will now be very quick? They both still declare a write over the entire range of keys.

On this note, what time do these GCRequests indicate to the CommandQueue that they're writing at? We should be able to move the requests into the past to the threshold timestamp such that #14342 ensures that future reads don't block on them. In fact, writes shouldn't need to block on them either, although I think there would be practical complications to allowing that.


pkg/storage/gc_queue.go, line 819 at r2 (raw file):

							// Allow releasing the memory backing batchGCKeys.
							iter.ResetAllocator()
							batchGCKeys = []roachpb.GCRequest_GCKey{}

batchGCKeys = nil


pkg/storage/gc_queue.go, line 836 at r2 (raw file):

	// From now on, all newly added keys are range-local.
	// TODO(tschottdorf): Might need to use two requests at some point since we

Does this TODO go away?


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Mar 26, 2018

Shouldn't that be disallowed by the backpressure introduced by #21777?

True (and exlained in #24215). The comment also applies to scenarios in which we can run with range sizes larger than the available RAM, though, which I hope will be a real option in 2.1 or 2.2 (and which will be what the roachtest will do, except it waits for upreplication before growing the data set to avoid snapshot blow-out).


Review status: 1 of 4 files reviewed at latest revision, 7 unresolved discussions.


pkg/storage/gc_queue.go, line 517 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Nit: we're in package storage so no need to repeat it in the doc comments.

Forgot to update this when I moved this to storage, done.


pkg/storage/gc_queue.go, line 538 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider adding a TODO to constrain this header to only what's actually in the GCRequest. Now that we're chunking these requests, we can be a lot more CommandQueue friendly.

This is actually happening under the hood already, see

// Intentionally don't call DefaultDeclareKeys: the key range in the header
. We've argued for a while that we should be sending individual requests instead, see #7880, and then we wouldn't have to set this fake span (which we use to bounce the commands on split out of paranoia) at all.
Let me know if you still think there should be a TODO.


pkg/storage/gc_queue.go, line 550 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/Range/Replica/

Done.


pkg/storage/gc_queue.go, line 555 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

ba.Add?

🤦‍♂️ Done


pkg/storage/gc_queue.go, line 568 at r2 (raw file):
See the other comment above. GCRequest declares a nonstandard set of keys.

On this note, what time do these GCRequests indicate to the CommandQueue that they're writing at?

They declare clock.Now() (see template()). You're right that we could run them at the larger of the two involved thresholds instead. That could actually be a nice win. And yeah, ditto for writes. Could we achieve this by having the requests declare synthetic keys only checked by other GCRequests? I.e. if there is a GCRequest for /table/a we really declare /local/magic/table/a to achieve that GCRequests will block on each other, but other traffic doesn't. I don't like the magic of that, though.

Either way, we shouldn't introduce any of these things without a benchmark. Some queue-type workload roachtests are overdue anyway, those should be able to profit from these kinds of tweaks.

Filed #24237.


pkg/storage/gc_queue.go, line 819 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

batchGCKeys = nil

Done.


pkg/storage/gc_queue.go, line 836 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this TODO go away?

I'm not seeing a TODO.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Review status: 1 of 4 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/storage/gc_queue.go, line 538 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This is actually happening under the hood already, see

// Intentionally don't call DefaultDeclareKeys: the key range in the header
. We've argued for a while that we should be sending individual requests instead, see #7880, and then we wouldn't have to set this fake span (which we use to bounce the commands on split out of paranoia) at all.
Let me know if you still think there should be a TODO.

Ah ok, I didn't know we were doing that. Thanks for explaining. No need for a TODO.


pkg/storage/gc_queue.go, line 568 at r2 (raw file):
Thanks for filing!

Could we achieve this by having the requests declare synthetic keys only checked by other GCRequests?

That should work, but this has me curious about something else. If GCRequest is declared at clock.Now(), is it ignored by historical reads in the commandQueue? In that case, how does the synchronization work when we check that the requestCanProceed? Is it possible that a historical read passes this check but still finds that the keys it is trying to read have been GCed?


pkg/storage/gc_queue.go, line 836 at r2 (raw file):
The one that was removed. It still seems applicable.

// TODO(tschottdorf): Might need to use two requests at some point since we
// hard-coded the full non-local key range in the header, but that does
// not take into account the range-local keys. It will be OK as long as
// we send directly to the Replica, though.


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Apr 5, 2018

Review status: 1 of 4 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/storage/gc_queue.go, line 568 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for filing!

Could we achieve this by having the requests declare synthetic keys only checked by other GCRequests?

That should work, but this has me curious about something else. If GCRequest is declared at clock.Now(), is it ignored by historical reads in the commandQueue? In that case, how does the synchronization work when we check that the requestCanProceed? Is it possible that a historical read passes this check but still finds that the keys it is trying to read have been GCed?

I think you might be right and we should fix it but morally this won't happen. Note that we bump the thresholds in the first command that doesn't actually GC any keys. This means that in practice it's visible to all reads by the time we actually remove data. I think there's a scenario in which a read slips by the GC bump but gets stalled before opening a RocksDB snapshot, so that when it finally does we have actually already removed keys? Setting ReplicatedEvalResult.BlockReads = true should fix that problem.

Filed https://github.com/cockroachdb/cockroach/issues/new.


pkg/storage/gc_queue.go, line 836 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The one that was removed. It still seems applicable.

// TODO(tschottdorf): Might need to use two requests at some point since we
// hard-coded the full non-local key range in the header, but that does
// not take into account the range-local keys. It will be OK as long as
// we send directly to the Replica, though.

I'm not sure about that TODO any more. GCRequest is a little weird in that its span doesn't mean much as long as it hits the right range. We use the whole range span, so anything that addresses there is also covered.


Comments from Reviewable

tbg added 2 commits April 5, 2018 19:00
The GC queue was already splitting up the set of keys to be deleted into
small parts to avoid overloading the Raft machinery. From there, it was
only a small step to let it dispatch the corresponding GCRequests
eagerly.

The upshot is that now we don't have to hold all of the keys to be
deleted in memory any more.

With appropriate testing, I think this is a change suitable for
inclusion in 2.0.1. "Appropriate testing" includes a roachtest which
creates a large range (larger than available RAM) and fills it with
GC'able data. (A variant of the `DROP` test that increases the range
size suitably should do it). Such a test is filed as cockroachdb#24214.

Release note (performance improvement): lowered the amount of memory
used during garbage collection of old versions.
@tbg tbg force-pushed the fix/chunked-gc branch from f452de3 to 87e85eb Compare April 5, 2018 23:02
@tbg
Copy link
Member Author

tbg commented Apr 5, 2018

bors r+

craig bot added a commit that referenced this pull request Apr 5, 2018
24209: storage: eagerly carry out chunked GC r=tschottdorf a=tschottdorf

The GC queue was already splitting up the set of keys to be deleted into
small parts to avoid overloading the Raft machinery. From there, it was
only a small step to let it dispatch the corresponding GCRequests eagerly.

The upshot is that now we don't have to hold all of the keys to be deleted
in memory any more.

With appropriate testing, I think this is a change suitable for inclusion
in 2.0.1. "Appropriate testing" includes a roachtest which creates a large
range (larger than available RAM) and fills it with GC'able data. (A
variant of the `DROP` test that increases the range size suitably should do
it).

Release note (performance improvement): lowered the amount of memory used
during garbage collection of old versions.
@craig
Copy link
Contributor

craig bot commented Apr 5, 2018

Build succeeded

@craig craig bot merged commit 87e85eb into cockroachdb:master Apr 5, 2018
@tbg tbg deleted the fix/chunked-gc branch April 5, 2018 23:34
@tbg tbg restored the fix/chunked-gc branch April 16, 2018 15:24
@tbg tbg deleted the fix/chunked-gc branch May 8, 2018 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants