Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backport-2.0: storage: filter abort span during split copy #26944

Merged
merged 1 commit into from
Jun 25, 2018

Conversation

tbg
Copy link
Member

@tbg tbg commented Jun 25, 2018

Backport 1/1 commits from #26934.

Needed slight adjustments to apply, but nothing in the split logic.

/cc @cockroachdb/release


We are suspecting that some workloads create a sufficient number of
abort span records to cause split commands that are too large to be
proposed to Raft, in effect rendering the range permanently
unsplittable. This is a problem since the range will at some point
start to backpressure writes (and even without that, it's a resource
hog). Most of the problematic abort span records would very likely
be removed during garbage collection; however, abort span records
aren't counted in any quantity that triggers GC.

Instead of copying the entirety of the abort span, restrict to the
entries that would not be removed by a GC operation. In practice,
this means that unless a high number of abort span records are
created every hour, very few records will actually be copied, and
in return the split command size should be small.

See #26830.

Release note (bug fix): Avoid a situation in which ranges repeatedly
fail to perform a split.

We are suspecting that some workloads create a sufficient number of
abort span records to cause split commands that are too large to be
proposed to Raft, in effect rendering the range permanently
unsplittable. This is a problem since the range will at some point
start to backpressure writes (and even without that, it's a resource
hog). Most of the problematic abort span records would very likely
be removed during garbage collection; however, abort span records
aren't counted in any quantity that triggers GC.

Instead of copying the entirety of the abort span, restrict to the
entries that would not be removed by a GC operation. In practice,
this means that unless a high number of abort span records are
created every hour, very few records will actually be copied, and
in return the split command size should be small.

See cockroachdb#26830.

Release note (bug fix): Avoid a situation in which ranges repeatedly
fail to perform a split.
@tbg tbg requested a review from a team June 25, 2018 07:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Reviewed 5 of 5 files at r1.
Review status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/storage/replica_command.go, line 1161 at r1 (raw file):

		// as well, but that could create a large Raft command in itself. Plus,
		// we'd have to adjust the stats computations.
		threshold := ts.Add(-txnCleanupThreshold.Nanoseconds(), 0)

When we GC normally, we record the threshold used in RangeTxnSpanGCThresholdKey. Do we need to write that here? (I know the non-txn GC threshold is important for mvcc values; I can't remember whether it really matters for abort span entries)


Comments from Reviewable

@tbg
Copy link
Member Author

tbg commented Jun 25, 2018

TFTR!

bors r=bdarnell

Now let's hope that the problem isn't actually having lots and lots of txn records.


Review status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/replica_command.go, line 1161 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

When we GC normally, we record the threshold used in RangeTxnSpanGCThresholdKey. Do we need to write that here? (I know the non-txn GC threshold is important for mvcc values; I can't remember whether it really matters for abort span entries)

This is a good point, but I don't think we have to do anything here. TxnSpanGCThreshold prevents recreation of txn records that were aborted. We're only removing abort span entries here, and with a one hour threshold, so if any aborted transaction is still around and hasn't noticed it at this point then it may fail to read its own write (but it would still not be able to abort). The heartbeat loop would have stopped the client many times over, though, so this isn't a real issue.


Comments from Reviewable

craig bot pushed a commit that referenced this pull request Jun 25, 2018
26944: backport-2.0: storage: filter abort span during split copy r=bdarnell a=tschottdorf

Backport 1/1 commits from #26934.

Needed slight adjustments to apply, but nothing in the split logic.

/cc @cockroachdb/release

---

We are suspecting that some workloads create a sufficient number of
abort span records to cause split commands that are too large to be
proposed to Raft, in effect rendering the range permanently
unsplittable. This is a problem since the range will at some point
start to backpressure writes (and even without that, it's a resource
hog). Most of the problematic abort span records would very likely
be removed during garbage collection; however, abort span records
aren't counted in any quantity that triggers GC.

Instead of copying the entirety of the abort span, restrict to the
entries that would not be removed by a GC operation. In practice,
this means that unless a high number of abort span records are
created every hour, very few records will actually be copied, and
in return the split command size should be small.

See #26830.

Release note (bug fix): Avoid a situation in which ranges repeatedly
fail to perform a split.


Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 25, 2018

Build succeeded

@craig craig bot merged commit 69ab756 into cockroachdb:release-2.0 Jun 25, 2018
@tbg tbg deleted the backport2.0-26934 branch July 10, 2018 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants