Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: support STAGING transactions in kv protocol and client #36012

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Mar 21, 2019

This PR polishes off the protocol and client level changes from #35165 to allow transactions to be moved to the STAGING state during parallel commits. To do so, the change makes two major contributions.

First, it adds support to EndTransaction request and response for moving a transaction record to the STAGING status in addition to the ABORTED and COMMITTED status. This is done in a way that interacts elegantly with a transaction's in-flight writes and allows the request to move a transaction record directly to the COMMITTED status when all of the in-flight writes are on the same range as the transaction record.

The second contribution is that the change adds full support to the txnCommitter interceptor to issue parallel commits when possible and then to launch the asynchronous portion of marking a transaction as explicitly committed if it determines that a transaction is implicitly committed. This is likely the area of the change where reviewers will want to look first, as it incorporates the bulk of the parallel commits algorithm and also serves as its primary documentation.

The final remaining piece of work after this change is to adjust DistSender to actually send EndTransaction requests in parallel with writes in the same batch and with QueryIntents for previous writes. This change makes it safe to do so, but doesn't actually add any logic to DistSender. There are a few complications to that piece and it's also the least interesting part of this change, which is why I have saved it for last.

@nvanbenschoten nvanbenschoten requested review from ajwerner and tbg March 21, 2019 00:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch 4 times, most recently from 742f533 to 07f6a10 Compare March 27, 2019 22:21
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch from 07f6a10 to 06f74b4 Compare April 2, 2019 19:37
@nvanbenschoten
Copy link
Member Author

Ok, this is ready for review when possible. The one major questions I have that I'd appreciate input from @tbg, @ajwerner, and maybe @andreimatei on is about how we should represent a transaction's "intent spans" and "in-flight requests". This comes up both in the EndTransactionRequest and in the Transaction record. The question is a continuation of the thread from #35763 (review). I'll continue to work on this, but it's something to keep in mind when reviewing the PR.

Also, I'm not planning on merging this until after the DistSender changes are also ready to go in because on its own this will cause a performance hit.

@nvanbenschoten nvanbenschoten marked this pull request as ready for review April 2, 2019 19:43
@nvanbenschoten nvanbenschoten requested review from a team April 2, 2019 19:43
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, 2 of 2 files at r2, 28 of 28 files at r3, 17 of 30 files at r4, 2 of 2 files at r5, 28 of 29 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @tbg)


pkg/kv/txn_interceptor_committer.go, line 147 at r3 (raw file):

		// the status back to PENDING.
		if txn := pErr.GetTxn(); txn != nil && txn.Status == roachpb.STAGING {
			pErr.SetTxn(cloneWithStatus(txn, roachpb.PENDING))

Just to check my understanding, at this point the serialized Txn record will be STAGING but we know here that it's not committed so our in-memory view of the Txn is PENDING?


pkg/kv/txn_interceptor_committer.go, line 151 at r3 (raw file):

		// Same deal with MixedSuccessErrors.
		// TODO(nvanbenschoten): Merge code? Wait for MixedSuccessError to go away?
		if aPSErr, ok := pErr.GetDetail().(*roachpb.MixedSuccessError); ok {

What is aPSErr supposed to mean?


pkg/kv/txn_interceptor_committer.go, line 255 at r3 (raw file):

// committing EndTransaction in parallel with other writes. If so, it returns
// the set of writes that may still be in-flight when the EndTransaction is
// evaluated. These writes should be declared are requirements for the

either should be declared as or are


pkg/kv/txn_interceptor_committer.go, line 405 at r6 (raw file):

	ba.Add(&et)

	const maxAttempts = 5

Comment on why retries might happen and why it is a good idea?


pkg/kv/txn_interceptor_committer_test.go, line 194 at r3 (raw file):

	mockSender.MockSend(func(ba roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error) {
		require.Equal(t, 3, len(ba.Requests))

require.Len is a thing


pkg/kv/txn_interceptor_pipeliner.go, line 68 at r3 (raw file):

// that have been proposed but not proven to have succeeded are first checked
// before considering the transaction committed. These async writes are referred
// to as "outstanding writes" and this process of proving that an outstanding

should we call them "in-flight writes" for consistency? Can you help elucidate the difference between "outstanding" and "in-flight" for me? Both terms are used in a sentence below. From what I gather one is the set of writes stored in the BTree of the txnPipeliner and another is the map of the EndTransaction populated by the txnComitter. The mixed terminology is a little bit confusing.


pkg/roachpb/data.go, line 980 at r3 (raw file):

	}
	if !t.Status.IsFinalized() {
		if (t.Epoch < o.Epoch) || (t.Epoch == o.Epoch && o.Status != PENDING) {

putting a pin in this to remember to think harder about this condition


pkg/storage/replica_test.go, line 10672 at r3 (raw file):

				return sendWrappedWithErr(etH, &et)
			},
			expError: "TransactionAbortedError(ABORT_REASON_ALREADY_COMMITTED_OR_ROLLED_BACK_POSSIBLE_REPLAY)",

Is TransactionAbortedError a good name for this error?


pkg/storage/replica_test.go, line 11114 at r3 (raw file):

			},
			run: func(txn *roachpb.Transaction, _ hlc.Timestamp) error {
				rt := recoverTxnArgs(txn, true /* implicitlyCommitted */)

This got renamed to allWritesFound


pkg/storage/replica_write.go, line 506 at r3 (raw file):

				et = etClone

				reqsClone := append([]roachpb.RequestUnion(nil), ba.Requests...)

preallocate this because you know the size


pkg/storage/replica_write.go, line 494 at r6 (raw file):

	}

	origET := et

Can you comment about what's going on here?


pkg/storage/batcheval/cmd_end_transaction.go, line 242 at r3 (raw file):

		case roachpb.PENDING:
			if h.Txn.Epoch < reply.Txn.Epoch {

Given the code duplication you could make this case PENDING, STAGING and put a

if reply.Txn.Status == roachpb.STAGING {
   // Remove the existing in-flight writes. They may be replaced.
   reply.Txn.InFlightWrites = nil
}

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are questions more than anything

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @tbg)

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch 2 times, most recently from f9d3fc5 to 2f1d26c Compare April 17, 2019 18:41
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)


pkg/kv/txn_interceptor_committer.go, line 147 at r3 (raw file):

Previously, ajwerner wrote…

Just to check my understanding, at this point the serialized Txn record will be STAGING but we know here that it's not committed so our in-memory view of the Txn is PENDING?

Yes, exactly! This is really subtle so I'll improve the comment.


pkg/kv/txn_interceptor_committer.go, line 151 at r3 (raw file):

Previously, ajwerner wrote…

What is aPSErr supposed to mean?

I think it's "a partial success error". I got that from here. I really need to just address the TODO and get rid of MixedSuccessErrors entirely.


pkg/kv/txn_interceptor_committer.go, line 255 at r3 (raw file):

Previously, ajwerner wrote…

either should be declared as or are

Done.


pkg/kv/txn_interceptor_committer.go, line 405 at r6 (raw file):

Previously, ajwerner wrote…

Comment on why retries might happen and why it is a good idea?

I removed this. I don't have a strong enough understanding of when retries would be a good idea. We can add them back in if we see a need for them.


pkg/kv/txn_interceptor_committer_test.go, line 194 at r3 (raw file):

Previously, ajwerner wrote…

require.Len is a thing

Nice! Switched where appropriate.


pkg/kv/txn_interceptor_pipeliner.go, line 68 at r3 (raw file):

Previously, ajwerner wrote…

should we call them "in-flight writes" for consistency? Can you help elucidate the difference between "outstanding" and "in-flight" for me? Both terms are used in a sentence below. From what I gather one is the set of writes stored in the BTree of the txnPipeliner and another is the map of the EndTransaction populated by the txnComitter. The mixed terminology is a little bit confusing.

Yeah, I think this is a good point. The distinction here is that:
"outstanding" = writes that have not completed by the time an EndTransaction is issued
"in-flight" = writes that are in-flight at the same time as an EndTransaction

This distinction here makes sense given the development progression of txn pipelining and now parallel commits, but as a reader of this code I can see how the distinction would feel artificial. There's definitely a draw to unifying the concepts.

If we were to unify them, which name would you like better? "Outstanding"? "In-flight"? "Concurrent"? Something else? I'm into the idea of making the change, I just don't want to change the 200 instances of "outstanding" to something else before we've settled on something better. @tbg do you have an opinion?


pkg/storage/replica_test.go, line 10672 at r3 (raw file):

Previously, ajwerner wrote…

Is TransactionAbortedError a good name for this error?

TransactionAbortedErrors arise in these situations because the transaction record is committed and already GCed by the time the second request reaches it. At that point, there's no way to tell whether the transaction was committed or aborted (unlike in the "without eager gc" cases below). All we really know if that the transaction record can't be created again in the future.

Fortunately, it doesn't really matter because transaction records can't be GCed until after all of their intents have already been resolved. This is explained in pretty good detail here.


pkg/storage/replica_test.go, line 11114 at r3 (raw file):

Previously, ajwerner wrote…

This got renamed to allWritesFound

Ah good catch, but other way around. I renamed it to implicitlyCommitted in the previous PR and didn't catch all of these during the rebase. Done.


pkg/storage/replica_write.go, line 506 at r3 (raw file):

Previously, ajwerner wrote…

preallocate this because you know the size

That doesn't seem to make a difference:

var slice = make([]int, 512)

func BenchmarkPreAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = append(make([]int, 0, len(slice)), slice...)
	}
}

func BenchmarkNoPreAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = append([]int(nil), slice...)
	}
}
name          time/op
PreAlloc-4     722ns ± 1%
NoPreAlloc-4   723ns ± 4%

name          alloc/op
PreAlloc-4    4.10kB ± 0%
NoPreAlloc-4  4.10kB ± 0%

name          allocs/op
PreAlloc-4      1.00 ± 0%
NoPreAlloc-4    1.00 ± 0%

pkg/storage/replica_write.go, line 494 at r6 (raw file):

Previously, ajwerner wrote…

Can you comment about what's going on here?

Done.


pkg/storage/batcheval/cmd_end_transaction.go, line 242 at r3 (raw file):

Previously, ajwerner wrote…

Given the code duplication you could make this case PENDING, STAGING and put a

if reply.Txn.Status == roachpb.STAGING {
   // Remove the existing in-flight writes. They may be replaced.
   reply.Txn.InFlightWrites = nil
}

Done.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch from 2f1d26c to 5791929 Compare April 17, 2019 18:43
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch from 5791929 to a635e1c Compare May 2, 2019 19:58
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request May 3, 2019
This was suggested in cockroachdb#36012. The following commit will make a
distinction between "in-flight" writes and a "write footprint",
so it makes sense to first perform this global rename.

Release note: None
@nvanbenschoten
Copy link
Member Author

The final two points to address here before it is ready to review are:

  1. restructuring the in-flight writes in the EndTransactionRequest and Transaction protos.
  2. unifying the concepts of "outstanding" pipelined writes and "in-flight" writes during a parallel commit.

Both of these are addressed in #37302, so that PR is the final precursor to this one.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request May 3, 2019
This commit addresses the impedance mismatch discussed in
cockroachdb#35763 (review)
between tracking in-flight writes and tracking a transaction's intent
"footprint". These are two related but fundamentally different concepts
and it's best to be explicit about how they relate and how they differ.

To this end, this commit moves all intent tracking into the txnPipeliner.
The pipeliner now tracks two separate sets: the "in-flight writes" and the
"write footprint". The former is morally a subset of the latter, but for
the sake of avoiding duplication, in-flight writes are not included in the
write footprint. Instead, the write footprint includes:
1. write from the current epoch that have already succeeded (point writes
   that were pipelined and already proved and ranged writes)
2. writes from the current epoch that may or may not have succeeded
3. all writes from prior epochs

Structuring the two sets this way has a major benefit -- it is exactly
how we would like to represent the writes that a transaction has performed
on `EndTransactionRequests` and on STAGING `Transaction` records. This
will simplify client-logic in cockroachdb#36012 significantly.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request May 7, 2019
This was suggested in cockroachdb#36012. The following commit will make a
distinction between "in-flight" writes and a "write footprint",
so it makes sense to first perform this global rename.

Release note: None
This allows QueryIntent to behave correctly even as a transaction's
provisional commit timestamp grows.

Release note: None
Parallel commits will want intents that are found to be missing to be prevented
from ever being written. This is true both for the committing transaction itself
and other transactions attempting to recover a STAGING transaction. In fact, all
use cases of QueryIntent will now want the PREVENT IfMissingBehavior. However, a
committing transaction will still also want the existing behavior of returning
an error if the intent is missing, which is provided by the RETURN_ERROR
IfMissingBehavior. This indicates that the options in the IfMissingBehavior are
not sufficiently orthogonal to justify an enum.

This change removes the enum and replaces it with an `ErrorIfMissing` bool. This
is backwards compatible because up until now, no has used the PREVENT behavior.

Release note: None
These checks usually go hand-in-hand, but the output of the error
check is almost always more useful when tests fail.

Release note: None
roachpb.EndTransactionRequest is 120 bytes large. We shouldn't pass it
by value to these leaf functions.

Release note: None
This commit adjusts the representation of `InFlightWrites` in Transaction
protos in accordance with the thinking from cockroachdb#37302. The structure no
longer points into the `IntentSpans`. Instead, it lives alongside the
`IntentSpans`, which logically extends to include all of the `InFlightWrites`
but doesn't in practice to avoid duplication.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch from d7ed002 to c7bad2b Compare May 15, 2019 13:26
@nvanbenschoten
Copy link
Member Author

@tbg I decided I didn't want to sit on this until #37469 is ready because this large of a PR tends to rot and I don't intend to finalize that PR until next week. I added in one more commit that temporarily disables parallel commits, which we can revert when #37469 goes in.

Here's a note from that commit message:

The fear of merging #36012 without #37469 and without disabling parallel
commits is that we would then be adding a new transaction state transition
without actually performing implicit commits in parallel with the rest of
the transaction's writes. This appears to slow down nightly benchmarks by
about 5-10%.

With that extra commit, this PR is now eligible to be merged into master.

@tbg
Copy link
Member

tbg commented May 15, 2019

Sounds good, I didn't think that was easy enough to do or I would've suggested so myself during review. Going to give this a look later today.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch from c7bad2b to 987cc5b Compare May 15, 2019 14:10
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request May 15, 2019
This commit allows us to merge cockroachdb#36012 without waiting for cockroachdb#37469. We can
revert this commit once the changes in cockroachdb#37469 land.

The fear of merging cockroachdb#36012 without cockroachdb#37469 and without disabling parallel
commits is that we would then be adding a new transaction state transition
without actually performing implicit commits in parallel with the rest of
the transaction's writes. This appears to slow down nightly benchmarks by
about 5-10%.

Release note: None
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 9 files at r15, 3 of 3 files at r16.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_write.go, line 527 at r15 (raw file):

				// Ranged write or read. See below.
			}
		}

assert len(inflight) >= len(writes)


pkg/storage/replica_write.go, line 572 at r15 (raw file):

		//
		// This reduces the complexity of this loop from O(n*log(m)) down to
		// O(n) where n is the number of requests in the batch and m is the

point writes

#batch + #pointwritesinbatch * log inflight

This PR polishes off the protocol and client level changes from cockroachdb#35165
to allow transactions to be moved to the STAGING state during parallel
commits. To do so, the change makes two major contributions.

First, it adds support to EndTransaction request and response for moving
a transaction record to the STAGING status in addition to the ABORTED
and COMMITTED status. This is done in a way that interacts elegantly
with a transaction's in-flight writes and allows the request to move a
transaction record directly to the COMMITTED status when all of the
in-flight writes are on the same range as the transaction record.

The second contribution is that the change adds full support to the
txnCommitter interceptor to issue parallel commits when possible and
then to launch the asynchronous portion of marking a transaction as
explicitly committed if it determines that a transaction is implicitly
committed. This is likely the area of the change where reviewers will
want to look first, as it incorporates the bulk of the parallel commits
algorithm and also serves as its primary documentation.

The final remaining piece of work after this change is to adjust
DistSender to actually send EndTransaction requests in parallel
with writes in the same batch and with QueryIntents for previous
writes. This change makes it safe to do so, but doesn't actually
add any logic to DistSender. There are a few complications to that
piece and it's also the least interesting part of this change,
which is why I have saved it for last.

Release note: None
Just code movement.

Release note: None
This commit allows us to merge cockroachdb#36012 without waiting for cockroachdb#37469. We can
revert this commit once the changes in cockroachdb#37469 land.

The fear of merging cockroachdb#36012 without cockroachdb#37469 and without disabling parallel
commits is that we would then be adding a new transaction state transition
without actually performing implicit commits in parallel with the rest of
the transaction's writes. This appears to slow down nightly benchmarks by
about 5-10%.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/txnParallelCommitter2 branch from 987cc5b to 9d874cc Compare May 15, 2019 18:41
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: TFTRs!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)


pkg/storage/replica_write.go, line 527 at r15 (raw file):

Previously, tbg (Tobias Grieger) wrote…

assert len(inflight) >= len(writes)

Done.


pkg/storage/replica_write.go, line 572 at r15 (raw file):

Previously, tbg (Tobias Grieger) wrote…

point writes

#batch + #pointwritesinbatch * log inflight

Done.

@nvanbenschoten
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request May 15, 2019
36012: kv: support STAGING transactions in kv protocol and client r=nvanbenschoten a=nvanbenschoten

This PR polishes off the protocol and client level changes from #35165 to allow transactions to be moved to the STAGING state during parallel commits. To do so, the change makes two major contributions.

First, it adds support to `EndTransaction` request and response for moving a transaction record to the STAGING status in addition to the ABORTED and COMMITTED status. This is done in a way that interacts elegantly with a transaction's in-flight writes and allows the request to move a transaction record directly to the COMMITTED status when all of the in-flight writes are on the same range as the transaction record.

The second contribution is that the change adds full support to the `txnCommitter` interceptor to issue parallel commits when possible and then to launch the asynchronous portion of marking a transaction as explicitly committed if it determines that a transaction is implicitly committed. This is likely the area of the change where reviewers will want to look first, as it incorporates the bulk of the parallel commits algorithm and also serves as its primary documentation.

The final remaining piece of work after this change is to adjust DistSender to actually send `EndTransaction` requests in parallel with writes in the same batch and with `QueryIntents` for previous writes. This change makes it safe to do so, but doesn't actually add any logic to DistSender. There are a few complications to that piece and it's also the least interesting part of this change, which is why I have saved it for last.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 15, 2019

Build succeeded

@craig craig bot merged commit 9d874cc into cockroachdb:master May 15, 2019
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/txnParallelCommitter2 branch May 20, 2019 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants