-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport-2.0: storage: filter abort span during split copy #26944
Conversation
We are suspecting that some workloads create a sufficient number of abort span records to cause split commands that are too large to be proposed to Raft, in effect rendering the range permanently unsplittable. This is a problem since the range will at some point start to backpressure writes (and even without that, it's a resource hog). Most of the problematic abort span records would very likely be removed during garbage collection; however, abort span records aren't counted in any quantity that triggers GC. Instead of copying the entirety of the abort span, restrict to the entries that would not be removed by a GC operation. In practice, this means that unless a high number of abort span records are created every hour, very few records will actually be copied, and in return the split command size should be small. See cockroachdb#26830. Release note (bug fix): Avoid a situation in which ranges repeatedly fail to perform a split.
Reviewed 5 of 5 files at r1. pkg/storage/replica_command.go, line 1161 at r1 (raw file):
When we GC normally, we record the threshold used in Comments from Reviewable |
TFTR! bors r=bdarnell Now let's hope that the problem isn't actually having lots and lots of txn records. Review status: pkg/storage/replica_command.go, line 1161 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
This is a good point, but I don't think we have to do anything here. TxnSpanGCThreshold prevents recreation of txn records that were aborted. We're only removing abort span entries here, and with a one hour threshold, so if any aborted transaction is still around and hasn't noticed it at this point then it may fail to read its own write (but it would still not be able to abort). The heartbeat loop would have stopped the client many times over, though, so this isn't a real issue. Comments from Reviewable |
26944: backport-2.0: storage: filter abort span during split copy r=bdarnell a=tschottdorf Backport 1/1 commits from #26934. Needed slight adjustments to apply, but nothing in the split logic. /cc @cockroachdb/release --- We are suspecting that some workloads create a sufficient number of abort span records to cause split commands that are too large to be proposed to Raft, in effect rendering the range permanently unsplittable. This is a problem since the range will at some point start to backpressure writes (and even without that, it's a resource hog). Most of the problematic abort span records would very likely be removed during garbage collection; however, abort span records aren't counted in any quantity that triggers GC. Instead of copying the entirety of the abort span, restrict to the entries that would not be removed by a GC operation. In practice, this means that unless a high number of abort span records are created every hour, very few records will actually be copied, and in return the split command size should be small. See #26830. Release note (bug fix): Avoid a situation in which ranges repeatedly fail to perform a split. Co-authored-by: Tobias Schottdorf <[email protected]>
Build succeeded |
Backport 1/1 commits from #26934.
Needed slight adjustments to apply, but nothing in the split logic.
/cc @cockroachdb/release
We are suspecting that some workloads create a sufficient number of
abort span records to cause split commands that are too large to be
proposed to Raft, in effect rendering the range permanently
unsplittable. This is a problem since the range will at some point
start to backpressure writes (and even without that, it's a resource
hog). Most of the problematic abort span records would very likely
be removed during garbage collection; however, abort span records
aren't counted in any quantity that triggers GC.
Instead of copying the entirety of the abort span, restrict to the
entries that would not be removed by a GC operation. In practice,
this means that unless a high number of abort span records are
created every hour, very few records will actually be copied, and
in return the split command size should be small.
See #26830.
Release note (bug fix): Avoid a situation in which ranges repeatedly
fail to perform a split.