-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: write to AbortSpan less, clean up AbortSpan more #42765
storage: write to AbortSpan less, clean up AbortSpan more #42765
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we clear when !Poison
seems problematic. This isn't new behavior, but it looks worth fixing (though not necessarily in this PR).
Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 3 of 3 files at r3, 15 of 15 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/storage/batcheval/cmd_resolve_intent.go, line 42 at r4 (raw file):
txnID = t.IntentTxn.ID } if status == roachpb.ABORTED {
Could add a comment here that we won't even always write the key in this case, but we can't know this from this code. Your call
pkg/storage/batcheval/transaction_test.go, line 190 at r2 (raw file):
{ // NOTE: this request doesn't make sense, but we handle it. An abort // span shouldn't be present if the transaction is still committable.
In practice the commit would run into the abort span, but this check is outside of the code being tested, right?
pkg/storage/batcheval/transaction_test.go, line 306 at r2 (raw file):
}, { name: "resolve intent, txn aborted, no poison, intent missing, abort span present",
Isn't this problematic? If a txn gets pushed first but the corresponding non-poisoning ResolveIntent gets delayed, then the txn gets aborted and poisoning happens, won't the delayed request unpoison accidentally?
The direct blame is broken due to code movement, but I think this is a problem I introduced at some point. Worth filing an issue. The bool should (as always) have been an enum NOOP, CLEAR, POISON
.
a4e578f
to
a8c828c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)
pkg/storage/batcheval/cmd_resolve_intent.go, line 42 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Could add a comment here that we won't even always write the key in this case, but we can't know this from this code. Your call
Done.
pkg/storage/batcheval/transaction_test.go, line 190 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
In practice the commit would run into the abort span, but this check is outside of the code being tested, right?
Well, an abort span should be possible to write unless the txn is already aborted, in which case the EndTxn would necessarily fail.
pkg/storage/batcheval/transaction_test.go, line 306 at r2 (raw file):
If a txn gets pushed first but the corresponding non-poisoning ResolveIntent gets delayed
I don't think there's a case where a PushTxn triggers a ResolveIntent without a Poison flag (even for PUSH_TIMESTAMP), so I think the only case where a ResolveIntent request can have Poison=false
is when it was initiated by the transaction coordinator itself or when it was initiated by GC. Maybe I'm missing something.
The fact that we clear when !Poison seems problematic.
Isn't this kind of the point? When !Poison, we're saying that we 're operating as the transaction and we're aware of the ABORT so we're able to clear. When were you thinking we'd set Poison = NOOP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM good stuff
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/storage/batcheval/cmd_end_transaction.go, line 521 at r7 (raw file):
} if WriteAbortSpanOnResolve(txn.Status) {
I like txn.Status == ABORTED
better. In fact I'd get rid of this function.
pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):
if WriteAbortSpanOnResolve(txn.Status) { if err := SetAbortSpan(ctx, evalCtx, batch, ms, txn.TxnMeta, args.Poison); err != nil {
please add a comment that this surprisingly might clear the entry. Or better yet, split the SetAbortSpan()
function in two. I think this function is the textbook example of a bool argument anti-pattern.
pkg/storage/batcheval/transaction_test.go, line 27 at r6 (raw file):
// TestSetAbortSpan tests the different ways that request can set, update, and // delete AbortSpan entries. func TestSetAbortSpan(t *testing.T) {
excellent test
pkg/storage/batcheval/transaction_test.go, line 43 at r6 (raw file):
txn := roachpb.MakeTransaction("test", txnKey, 0, hlc.Timestamp{WallTime: 1}, 0) txnAbortSpanEntry := roachpb.AbortSpanEntry{
maybe name this newTxnAbortSpanEntry
pkg/storage/batcheval/transaction_test.go, line 50 at r6 (raw file):
// Used to detect updates to the AbortSpan entry. txnPrev := txn.Clone() txnPrev.WriteTimestamp.Add(-1, 0)
can you remind the reader if the priority and write timestamp matter for anything or if they're just testing artifacts?
pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):
// NOTE: this request doesn't make sense, but we handle it. An abort // span shouldn't be present if the transaction is still committable. name: "end txn, commit, no poison, abort span present",
let's crash in this case
pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):
// NOTE: this request doesn't make sense, but we handle it. An EndTxn // should never pass Commit = true and Poison = true. name: "end txn, commit, poison, abort span present",
let's crash
pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):
name: "resolve intent, txn pending, poison, intent missing, abort span missing", run: func(b engine.ReadWriter, rec EvalContext) error { return resolveIntent(b, rec, roachpb.PENDING, true /* poison */)
resolving an intent with PENDING doesn't make sense, right? Let's crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @tbg)
pkg/storage/batcheval/transaction_test.go, line 306 at r2 (raw file):
I don't think there's a case where a PushTxn triggers a ResolveIntent without a Poison flag (even for PUSH_TIMESTAMP),
Heh, it turns out you're right, but only because I had this same thought in 2017 and made that change:
cockroach/pkg/storage/intentresolver/intent_resolver.go
Lines 265 to 278 in 844eac4
// We always poison due to limitations of the API: not poisoning equals | |
// clearing the AbortSpan, and if our pushee transaction first got pushed | |
// for timestamp (by us), then (by someone else) aborted and poisoned, and | |
// then we run the below code, we're clearing the AbortSpan illegaly. | |
// Furthermore, even if our pushType is not PUSH_ABORT, we may have ended | |
// up with the responsibility to abort the intents (for example if we find | |
// the transaction aborted). | |
// | |
// To do better here, we need per-intent information on whether we need to | |
// poison. | |
if err := ir.ResolveIntents(ctx, resolveIntents, | |
ResolveOptions{Wait: false, Poison: true}); err != nil { | |
return cleanup, roachpb.NewError(err) | |
} |
I was just mildly confused how this all worked (if we push an intent timestamp, certainly the txn should not get a TransactionAbortedError?) but it looks like even before your PR we only mutate the abort span when aborting. So in a roundabout way, we have already implemented the NOOP option. I don't love the messiness of this, but it seems to be working as intended already: if we're not removing intents, Poison is effectively ignored.
a8c828c
to
3d373eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)
pkg/storage/batcheval/cmd_end_transaction.go, line 521 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I like
txn.Status == ABORTED
better. In fact I'd get rid of this function.
But there's more to this function that just txn.Status == ABORTED
.
pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please add a comment that this surprisingly might clear the entry. Or better yet, split the
SetAbortSpan()
function in two. I think this function is the textbook example of a bool argument anti-pattern.
How about renaming it to UpdateAbortSpan
? I'd rather not pull the condition out to all callers and force them to make decisions that don't really concern them.
pkg/storage/batcheval/transaction_test.go, line 43 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
maybe name this
newTxnAbortSpanEntry
Done.
pkg/storage/batcheval/transaction_test.go, line 50 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
can you remind the reader if the priority and write timestamp matter for anything or if they're just testing artifacts?
Good idea, done.
pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
let's crash in this case
If we're all about the KV layer providing an external API (which it sounds like we are) then I don't think we should crash on user input. This case doesn't do anything wrong, we just wouldn't even get to batch evaluation if there's an abort span present.
pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
let's crash
Again, I don't think we should crash, but in this case, I could see some extra input validation and rejecting the request. Done.
pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):
resolving an intent with PENDING doesn't make sense, right?
No, it does. That's what happens during a PUSH_TIMESTAMP push.
77d4805
to
fbc5c1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're all about the KV layer providing an external API (which it sounds like we are) then I don't think we should crash on user input. This case doesn't do anything wrong, we just wouldn't even get to batch evaluation if there's an abort span present.
Exactly - we wouldn't get to batch evaluation if there's an abort span. Even if the KV layer were to be treated as an "external" API (where I'd litigate the meaning of "external"), the KV layer isn't the request evaluation functions. Committing a txn with an abort span entry set would be a bug, and a bad one (wouldn't it?). So what are we testing here? The test says // Not aborted, don't touch abort span.
. What does "not aborted" mean? The txn is aborted. Whether or not the EndTransaction
updates the abort span or not in this case seems like a frivolous concern.
Do you like this test? If I understand it correctly, I think we should get rid of it and just leave a note here about this case not being possible. And if it's easy enough, I do think I'd make sense to refuse the request (or crash), I guess at the evaluateBatch()
level. But that higher level function is beyond the concern of this test.
I haven't looked into this PR yet but reflecting on the last two comments I thought I'd share my views:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)
pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):
What does "not aborted" mean?
Would you like it better if it said "not aborting"?
I think we should get rid of it and just leave a note here about this case not being possible
We could get rid of the test case, but I think there's value in enumerating all combinations of inputs here and just explaining the test cases that should never be hit in practice. It makes it easier to ensure that we've exhaustively tested all valid cases and at least considered the validity of invalid ones. What's the cost of keeping these two cases?
And if it's easy enough, I do think I'd make sense to refuse the request (or crash)
We certainly don't want to read from RocksDB just to perform a local assertion that is already caught further up the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/storage/batcheval/cmd_end_transaction.go, line 521 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
But there's more to this function that just
txn.Status == ABORTED
.
ack
pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How about renaming it to
UpdateAbortSpan
? I'd rather not pull the condition out to all callers and force them to make decisions that don't really concern them.
UpdateAbortSpan
sounds good.
pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):
What's the cost of keeping these two cases?
The biggest cost is paid when this test fails one day and I start wracking my brain about what it is testing.
If you want to enumerate, enumerate it as a comment. The way I see it, the only test to write (if any) is a test that the request is rejected - at whatever layer.
But I'll shut up now.
We certainly don't want to read from RocksDB just to perform a local assertion that is already caught further up the stack.
Right, I wasn't suggesting we check anything in EndTxn
. I was saying we should do it in evaluateBatch()
- as I now see that we already do.
pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Again, I don't think we should crash, but in this case, I could see some extra input validation and rejecting the request. Done.
Thanks. I think the comment is no longer necessary on this test and the one above, now that we expect an error. Or change it to It is an error to...
.
pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
resolving an intent with PENDING doesn't make sense, right?
No, it does. That's what happens during a PUSH_TIMESTAMP push.
So what "doesn't make sense" here then? Can you add more words to the comment?
…st.go This mirrors `cmd_truncate_log.go`.
Up until this point, we had no testing around this. For instance, not a single test would catch it if we removed the deletion path from SetAbortSpan.
…ction request Fixes cockroachdb#29128. Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case. There were concerns raised in cockroachdb#29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range, so this isn't introducing any new concerns. Release note (bug fix): AbortSpan records are now cleaned up more aggresively when doing so is known to be safe.
This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions. In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC. Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design). We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction. Release note: None
Return an error if not. Release note: None
Purely a rename. Release note: None
Release note: None
fbc5c1f
to
c7bbc4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)
pkg/storage/batcheval/cmd_end_transaction.go, line 522 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
UpdateAbortSpan
sounds good.
Done.
pkg/storage/batcheval/transaction_test.go, line 191 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
What's the cost of keeping these two cases?
The biggest cost is paid when this test fails one day and I start wracking my brain about what it is testing.
If you want to enumerate, enumerate it as a comment. The way I see it, the only test to write (if any) is a test that the request is rejected - at whatever layer.
But I'll shut up now.We certainly don't want to read from RocksDB just to perform a local assertion that is already caught further up the stack.
Right, I wasn't suggesting we check anything in
EndTxn
. I was saying we should do it inevaluateBatch()
- as I now see that we already do.
Ok, I removed the test cases.
pkg/storage/batcheval/transaction_test.go, line 212 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Thanks. I think the comment is no longer necessary on this test and the one above, now that we expect an error. Or change it to
It is an error to...
.
Done.
pkg/storage/batcheval/transaction_test.go, line 262 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
So what "doesn't make sense" here then? Can you add more words to the comment?
I removed those comments. They actually do make sense because of the complexity @tbg brought up above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 it
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)
bors r+ |
42765: storage: write to AbortSpan less, clean up AbortSpan more r=nvanbenschoten a=nvanbenschoten This PR introduces two improvements to our handling of the AbortSpan. It also introduces the first set of comprehensive testing around AbortSpan entry state transitions, which was sorely missing. This comes after a few different customer issues that at least at first glance appeared to be AbortSpan leaks. There's still more to do to resolve those, mostly by improving GC, but anything we can do here on the frontend to reduce the number of AbortSpan entries that need to be GCed in the first place helps. ### Clean up span on non-poisoning, aborting EndTransaction request Fixes #29128. Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case. There were concerns raised in #29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range during normal intent resolution, so this isn't introducing any new concerns. ### Only write AbortSpan entries if intents are removed This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions. In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC. Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design). We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction. 43383: storage: stop modifying requests and returning responses from TxnWaitQueue r=nvanbenschoten a=nvanbenschoten This PR contains three pieces of cleanup that pave the way for pulling the TxnWaitQueue underneath the upcoming `pkg/storage/concurrency` package. It removes all cases where the queue mandates an update to the PushTxnRequest that entered it. It also removes all cases where the queue could field requests directly and return results for them. This restricts the TxnWaitQueue's interface and allows it to be pushed into the concurrency manager abstraction without bloating this new abstraction. Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeededAnd happy new year from bors! 🎉 |
In light of recent events, how nervous would we be about backporting this? |
I feel ok about backporting this to v19.2. Almost all of this was testing, and the few non-testing changes that went in here seem safe. Here's the PR: #45553. It was mostly clean, and the parts that weren't were because of renames and other non-behavioral changes. |
If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: cockroachdb#42765 3. aborting transactions in a busy loop: cockroachdb#38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in cockroachdb#44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable).
If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: cockroachdb#42765 3. aborting transactions in a busy loop: cockroachdb#38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in cockroachdb#44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable).
If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: cockroachdb#42765 3. aborting transactions in a busy loop: cockroachdb#38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in cockroachdb#44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable).
45573: storage: trigger GC based on SysCount/SysBytes r=ajwerner a=tbg If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: #42765 3. aborting transactions in a busy loop: #38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in #44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable). Co-authored-by: Tobias Schottdorf <[email protected]>
If there is "a lot" of data in Sys{Bytes,Count}, then we are likely experiencing a large abort span. The abort span is not supposed to become that large, but it does happen and causes stability fallout, usually due to a combination of shortcomings: 1. there's no trigger for GC based on abort span size alone (before this commit) 2. transaction aborts tended to create unnecessary abort span entries, fixed (and 19.2-backported) in: cockroachdb#42765 3. aborting transactions in a busy loop: cockroachdb#38088 (and we suspect this also happens in user apps occasionally) 4. large snapshots would never complete due to the queue time limits (addressed in cockroachdb#44952). In an ideal world, we would factor in the abort span into this method directly, but until then the condition guarding this block will do. At worst, there is some other reason for SysBytes become that large while also incurring a large SysCount, but I'm not sure how this would happen. The only other things in this span are the versioned range descriptors (which follow regular GC and it's only ever adding a SysCount of one), or transaction records (which expire like abort span records). Release note (bug fix): Range garbage collection will now trigger based on a large abort span, adding defense-in-depth against ranges growing large (and eventually unstable).
Fixes cockroachdb#43707. Fixes cockroachdb#48046. Fixes cockroachdb#48189. Part of the change made by cockroachdb#42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry.
48190: sql: inject tenant ID in sqlServerArgs, pass through ExecutorConfig r=nvanbenschoten a=nvanbenschoten Fixes #47903. Informs #48123. Also known as "the grand plumbing", this change replaces a few instances of `TODOSQLCodec` in `pkg/sql/sqlbase/index_encoding.go` and watches the house of cards fall apart. It then glues the world back together, this time using a properly injected tenant-bound SQLCodec to encode and decode all SQL table keys. A tenant ID field is added to `sqlServerArgs`. This is used to construct a tenant-bound `keys.SQLCodec` during server creation. This codec morally lives on the `sql.ExecutorConfig`. In practice, it is also copied onto `tree.EvalContext` and `execinfra.ServerConfig` to help carry it around. SQL code is adapted to use this codec whenever it needs to encode or decode keys. If all tests pass after this refactor, there is a good chance it got things right. This is because any use of an uninitialized SQLCodec will panic immediately when the codec is first used. This was helpful in ensuring that it was properly plumbed everywhere. 48245: kv: declare write access to AbortSpan on all aborting EndTxn reqs r=nvanbenschoten a=nvanbenschoten Fixes #43707. Fixes #48046. Fixes #48189. Part of the change made by #42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry. Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes cockroachdb#43707. Fixes cockroachdb#48046. Fixes cockroachdb#48189. Part of the change made by cockroachdb#42765 was to clear AbortSpan entries on non-poisoning, aborting EndTxn requests. Specifically, this change was made in 1328787. The change forgot to update the corresponding span declaration logic to reflect the fact that we were now writing to the AbortSpan in cases where we previously weren't. This was triggering an assertion in race builds that tried to catch this kind of undeclared span access. The assertion failure was very rare because it required the following conditions to all be met: 1. running a test with the race detector enabled 2. a txn (A) must have been aborted by another txn (B) 3. txn B must have cleared an intent on txn A's transaction record range 4. txn A must have noticed and issued a non-poisoning EndTxn(ABORT) We should backport this when we get a change (once v20.1.0 has stabilized), but I don't expect that this could actually cause any issues. The AbortSpan update was strictly a matter of performance and we should never be racing with another request that is trying to read the same AbortSpan entry.
This PR introduces two improvements to our handling of the AbortSpan. It also introduces the first set of comprehensive testing around AbortSpan entry state transitions, which was sorely missing.
This comes after a few different customer issues that at least at first glance appeared to be AbortSpan leaks. There's still more to do to resolve those, mostly by improving GC, but anything we can do here on the frontend to reduce the number of AbortSpan entries that need to be GCed in the first place helps.
Clean up span on non-poisoning, aborting EndTransaction request
Fixes #29128.
Before this change, an EndTransaction request sent to rollback a transaction record would not remove any AbortSpan entries, even if its own Poison flag was set to false. This allowed AbortSpan entries to leak. This commit fixes this behavior by removing the AbortSpan entry in this case.
There were concerns raised in #29128 about this being safe. It turns out that we already do this on every Range except the transaction record's Range during normal intent resolution, so this isn't introducing any new concerns.
Only write AbortSpan entries if intents are removed
This reduces the frequency of AbortSpan entries that can be abandoned even without a transaction coordinator failure. Specifically, it protects against the case where intent resolution races with a transaction coordinator cleaning up its own transaction record and intents. This can happen for both aborted and committed transactions.
In the first case, a pusher might find a transaction's intent and then find its record to be aborted after that transaction had cleanly rolled back its own intents. Even though the transaction's coordinator had already cleaned up and potentially "unpoisoned" AbortSpans, the pusher would happily re-introduce AbortSpan records when it goes to resolve the intents that were already cleaned up. These AbortSpan entries would be fully abandoned and have to wait out the GC.
Similarly, in the second case, the transaction might have committed. Here, the pushee might hit an intent and the txn coordinator might clean up and auto-GC its txn record before the pushee arrives at the txn record. Once the pushee gets there, it would mistake the txn for aborted (by design) and proceed to write an AbortSpan record where the intent it had once observed had been (not by design).
We can tell both of these cases by simply recognizing whether intent resolution actually succeeds. If intent resolution doesn't find an intent, then we might be in either case. That's fine, because we only need to ever poison the abort span if we actually remove an intent that could confuse a zombie transaction.