backport-2.0: storage: filter abort span during split copy #26944

tbg · 2018-06-25T07:41:55Z

Backport 1/1 commits from #26934.

Needed slight adjustments to apply, but nothing in the split logic.

/cc @cockroachdb/release

We are suspecting that some workloads create a sufficient number of
abort span records to cause split commands that are too large to be
proposed to Raft, in effect rendering the range permanently
unsplittable. This is a problem since the range will at some point
start to backpressure writes (and even without that, it's a resource
hog). Most of the problematic abort span records would very likely
be removed during garbage collection; however, abort span records
aren't counted in any quantity that triggers GC.

Instead of copying the entirety of the abort span, restrict to the
entries that would not be removed by a GC operation. In practice,
this means that unless a high number of abort span records are
created every hour, very few records will actually be copied, and
in return the split command size should be small.

See #26830.

Release note (bug fix): Avoid a situation in which ranges repeatedly
fail to perform a split.

We are suspecting that some workloads create a sufficient number of abort span records to cause split commands that are too large to be proposed to Raft, in effect rendering the range permanently unsplittable. This is a problem since the range will at some point start to backpressure writes (and even without that, it's a resource hog). Most of the problematic abort span records would very likely be removed during garbage collection; however, abort span records aren't counted in any quantity that triggers GC. Instead of copying the entirety of the abort span, restrict to the entries that would not be removed by a GC operation. In practice, this means that unless a high number of abort span records are created every hour, very few records will actually be copied, and in return the split command size should be small. See cockroachdb#26830. Release note (bug fix): Avoid a situation in which ranges repeatedly fail to perform a split.

cockroach-teamcity · 2018-06-25T07:42:00Z

This change is

bdarnell · 2018-06-25T14:38:40Z

Reviewed 5 of 5 files at r1.
Review status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/storage/replica_command.go, line 1161 at r1 (raw file):

		// as well, but that could create a large Raft command in itself. Plus,
		// we'd have to adjust the stats computations.
		threshold := ts.Add(-txnCleanupThreshold.Nanoseconds(), 0)

When we GC normally, we record the threshold used in RangeTxnSpanGCThresholdKey. Do we need to write that here? (I know the non-txn GC threshold is important for mvcc values; I can't remember whether it really matters for abort span entries)

Comments from Reviewable

tbg · 2018-06-25T14:55:10Z

TFTR!

bors r=bdarnell

Now let's hope that the problem isn't actually having lots and lots of txn records.

Review status: complete! 1 of 0 LGTMs obtained

pkg/storage/replica_command.go, line 1161 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

When we GC normally, we record the threshold used in RangeTxnSpanGCThresholdKey. Do we need to write that here? (I know the non-txn GC threshold is important for mvcc values; I can't remember whether it really matters for abort span entries)

This is a good point, but I don't think we have to do anything here. TxnSpanGCThreshold prevents recreation of txn records that were aborted. We're only removing abort span entries here, and with a one hour threshold, so if any aborted transaction is still around and hasn't noticed it at this point then it may fail to read its own write (but it would still not be able to abort). The heartbeat loop would have stopped the client many times over, though, so this isn't a real issue.

Comments from Reviewable

26944: backport-2.0: storage: filter abort span during split copy r=bdarnell a=tschottdorf Backport 1/1 commits from #26934. Needed slight adjustments to apply, but nothing in the split logic. /cc @cockroachdb/release --- We are suspecting that some workloads create a sufficient number of abort span records to cause split commands that are too large to be proposed to Raft, in effect rendering the range permanently unsplittable. This is a problem since the range will at some point start to backpressure writes (and even without that, it's a resource hog). Most of the problematic abort span records would very likely be removed during garbage collection; however, abort span records aren't counted in any quantity that triggers GC. Instead of copying the entirety of the abort span, restrict to the entries that would not be removed by a GC operation. In practice, this means that unless a high number of abort span records are created every hour, very few records will actually be copied, and in return the split command size should be small. See #26830. Release note (bug fix): Avoid a situation in which ranges repeatedly fail to perform a split. Co-authored-by: Tobias Schottdorf <[email protected]>

craig · 2018-06-25T15:34:44Z

Build succeeded

GitHub CI (Cockroach)

tbg requested a review from a team June 25, 2018 07:41

craig bot merged commit 69ab756 into cockroachdb:release-2.0 Jun 25, 2018

tbg deleted the backport2.0-26934 branch July 10, 2018 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backport-2.0: storage: filter abort span during split copy #26944

backport-2.0: storage: filter abort span during split copy #26944

tbg commented Jun 25, 2018

cockroach-teamcity commented Jun 25, 2018

bdarnell commented Jun 25, 2018

tbg commented Jun 25, 2018

craig bot commented Jun 25, 2018

backport-2.0: storage: filter abort span during split copy #26944

backport-2.0: storage: filter abort span during split copy #26944

Conversation

tbg commented Jun 25, 2018

cockroach-teamcity commented Jun 25, 2018

bdarnell commented Jun 25, 2018

tbg commented Jun 25, 2018

craig bot commented Jun 25, 2018

Build succeeded