-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: eagerly carry out chunked GC #24209
Conversation
Shouldn't that be disallowed by the backpressure introduced by #21777? This for 2.1 because it is a step towards increasing range size limits, but before we consider this for 2.0.1 we should do more investigation to figure out how we got such an oversized range to begin with. Reviewed 1 of 1 files at r1, 3 of 3 files at r2. pkg/storage/gc_queue.go, line 517 at r2 (raw file):
Nit: we're in package storage so no need to repeat it in the doc comments. pkg/storage/gc_queue.go, line 550 at r2 (raw file):
s/Range/Replica/ pkg/storage/gc_queue.go, line 568 at r2 (raw file):
Comment that we specifically do not batch SetGCThreshold with keys to GC to minimize contention in the command queue. Comments from Reviewable |
And approved for 2.0.1 following discussion in #24215 |
Reviewed 1 of 1 files at r1, 3 of 3 files at r2. pkg/storage/gc_queue.go, line 538 at r2 (raw file):
Consider adding a TODO to constrain this header to only what's actually in the GCRequest. Now that we're chunking these requests, we can be a lot more CommandQueue friendly. pkg/storage/gc_queue.go, line 555 at r2 (raw file):
pkg/storage/gc_queue.go, line 568 at r2 (raw file):
How does this minimize contention? Just because SetGCThreshold will now be very quick? They both still declare a write over the entire range of keys. On this note, what time do these GCRequests indicate to the CommandQueue that they're writing at? We should be able to move the requests into the past to the threshold timestamp such that #14342 ensures that future reads don't block on them. In fact, writes shouldn't need to block on them either, although I think there would be practical complications to allowing that. pkg/storage/gc_queue.go, line 819 at r2 (raw file):
pkg/storage/gc_queue.go, line 836 at r2 (raw file):
Does this TODO go away? Comments from Reviewable |
True (and exlained in #24215). The comment also applies to scenarios in which we can run with range sizes larger than the available RAM, though, which I hope will be a real option in 2.1 or 2.2 (and which will be what the roachtest will do, except it waits for upreplication before growing the data set to avoid snapshot blow-out). Review status: 1 of 4 files reviewed at latest revision, 7 unresolved discussions. pkg/storage/gc_queue.go, line 517 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Forgot to update this when I moved this to storage, done. pkg/storage/gc_queue.go, line 538 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is actually happening under the hood already, see cockroach/pkg/storage/batcheval/cmd_gc.go Line 36 in fbffeec
Let me know if you still think there should be a TODO. pkg/storage/gc_queue.go, line 550 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/gc_queue.go, line 555 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
🤦♂️ Done pkg/storage/gc_queue.go, line 568 at r2 (raw file):
They declare Either way, we shouldn't introduce any of these things without a benchmark. Some queue-type workload roachtests are overdue anyway, those should be able to profit from these kinds of tweaks. Filed #24237. pkg/storage/gc_queue.go, line 819 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/storage/gc_queue.go, line 836 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm not seeing a TODO. Comments from Reviewable |
Review status: 1 of 4 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. pkg/storage/gc_queue.go, line 538 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Ah ok, I didn't know we were doing that. Thanks for explaining. No need for a TODO. pkg/storage/gc_queue.go, line 568 at r2 (raw file):
That should work, but this has me curious about something else. If GCRequest is declared at pkg/storage/gc_queue.go, line 836 at r2 (raw file):
Comments from Reviewable |
Review status: 1 of 4 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. pkg/storage/gc_queue.go, line 568 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think you might be right and we should fix it but morally this won't happen. Note that we bump the thresholds in the first command that doesn't actually GC any keys. This means that in practice it's visible to all reads by the time we actually remove data. I think there's a scenario in which a read slips by the GC bump but gets stalled before opening a RocksDB snapshot, so that when it finally does we have actually already removed keys? Setting Filed https://github.com/cockroachdb/cockroach/issues/new. pkg/storage/gc_queue.go, line 836 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm not sure about that TODO any more. Comments from Reviewable |
Release note: None
The GC queue was already splitting up the set of keys to be deleted into small parts to avoid overloading the Raft machinery. From there, it was only a small step to let it dispatch the corresponding GCRequests eagerly. The upshot is that now we don't have to hold all of the keys to be deleted in memory any more. With appropriate testing, I think this is a change suitable for inclusion in 2.0.1. "Appropriate testing" includes a roachtest which creates a large range (larger than available RAM) and fills it with GC'able data. (A variant of the `DROP` test that increases the range size suitably should do it). Such a test is filed as cockroachdb#24214. Release note (performance improvement): lowered the amount of memory used during garbage collection of old versions.
bors r+ |
24209: storage: eagerly carry out chunked GC r=tschottdorf a=tschottdorf The GC queue was already splitting up the set of keys to be deleted into small parts to avoid overloading the Raft machinery. From there, it was only a small step to let it dispatch the corresponding GCRequests eagerly. The upshot is that now we don't have to hold all of the keys to be deleted in memory any more. With appropriate testing, I think this is a change suitable for inclusion in 2.0.1. "Appropriate testing" includes a roachtest which creates a large range (larger than available RAM) and fills it with GC'able data. (A variant of the `DROP` test that increases the range size suitably should do it). Release note (performance improvement): lowered the amount of memory used during garbage collection of old versions.
Build succeeded |
The GC queue was already splitting up the set of keys to be deleted into
small parts to avoid overloading the Raft machinery. From there, it was
only a small step to let it dispatch the corresponding GCRequests eagerly.
The upshot is that now we don't have to hold all of the keys to be deleted
in memory any more.
With appropriate testing, I think this is a change suitable for inclusion
in 2.0.1. "Appropriate testing" includes a roachtest which creates a large
range (larger than available RAM) and fills it with GC'able data. (A
variant of the
DROP
test that increases the range size suitably should doit).
Release note (performance improvement): lowered the amount of memory used
during garbage collection of old versions.