-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: avoid hanging forever on intents #51624
Conversation
I'm still working on the test for this, but wanted to get it out for an early look in the meantime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take another look when you get tests going (though, I assume, it shouldn't be that bad)
Reviewed 5 of 7 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @pbardea)
pkg/ccl/backupccl/backup_processor.go, line 180 at r1 (raw file):
header.UserPriority = roachpb.MaxUserPriority } log.Infof(ctx, "sending ExportRequest for span %s (attempt %d, priority %d)", span.span, span.attempts+1, header.UserPriority.String())
wrap very long line(s)?
pkg/ccl/backupccl/backup_processor.go, line 183 at r1 (raw file):
if pErr != nil { if err := pErr.Detail.GetTransactionRetry(); err != nil && err.Reason == roachpb.RETRY_REASON_UNKNOWN {
Should this check be moved & encapsulated in some sort of utility function?
Perhaps move it to export.go -- where it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was making the test much harder than it needed to be with multiple goroutines, but realized all we need is 4 things: begin, update, backup (with timeout), commit. �Added a nice simple test.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @pbardea)
pkg/ccl/backupccl/backup_processor.go, line 180 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
wrap very long line(s)?
Done.
pkg/ccl/backupccl/backup_processor.go, line 183 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
if pErr != nil { if err := pErr.Detail.GetTransactionRetry(); err != nil && err.Reason == roachpb.RETRY_REASON_UNKNOWN {
Should this check be moved & encapsulated in some sort of utility function?
Perhaps move it to export.go -- where it is used?
eh, I realized it could just be simplified in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with some nits. I'd like somebody from KV to chime in on the
Reason
.
Reviewed 1 of 7 files at r1, 3 of 4 files at r2.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @miretskiy, and @pbardea)
pkg/ccl/backupccl/backup_processor.go, line 184 at r2 (raw file):
rawRes, pErr := kv.SendWrappedWith(ctx, flowCtx.Cfg.DB.NonTransactionalSender(), header, req) if pErr != nil { if err := pErr.Detail.GetTransactionRetry(); err != nil {
I'd love to see a brief comment here pointing at the FailOnIntents
field of the request and how it lead to this retry error.
pkg/ccl/backupccl/backup_test.go, line 4367 at r2 (raw file):
// observe that the backup aborted our txn. require.Error(t, tx.Commit())
nit: pick this error apart and validate that it has the right code inside.
pkg/ccl/storageccl/export.go, line 160 at r2 (raw file):
if err != nil { if err, ok := err.(*roachpb.WriteIntentError); ok && args.FailOnIntents { wrapped := roachpb.NewTransactionRetryError(roachpb.RETRY_REASON_UNKNOWN, err.Error())
I'd support creating a new retry reason though I'd like to have somebody on the KV team aware and supportive of that too and so I'm nominating @andreimatei to help decide if that's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @dt and @pbardea)
pkg/ccl/backupccl/backup_test.go, line 4359 at r2 (raw file):
// limit how long we'll try to backup so if it hangs we find out. timeoutCtx, fn := context.WithTimeout(ctx, time.Second*10)
perhaps a bit higher? particularly w/ stressrace it might take longer?
Or, just disable under stress/race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like somebody from KV to chime in on the Reason
I'd encourage us to avoid a TransactionRetryError here. We're not running in a transaction so it's entirely unexpected for a TransactionRetryError to be returned. I'm actually surprised this doesn't blow up on the client when passing through a non-transactional sender. TransactionRetryError has a specific meaning, it's not just a generic RetryError, so if we're going to go with this approach, introducing a new error type would be preferable.
That said, I still don't think I fully understand the approach here and what it's optimizing for. We discussed on slack the alternative of simply running all BACKUPS as high-priority, which avoids all of this complexity and is much easier to think about. Of course, this could be disruptive if users don't run BACKUP with AS OF SYSTEM TIME '-30s', which David mentioned was "a good best practice" but I assume is not something we actually expect most users to be doing (right?). But I don't actually think the approach we have here is guaranteed to be any less disruptive. We retry 3 times at normal priority before upgrading to high priority, but there's no lower bound on how long those retries will take or how long we'll wait between retries. So it's still possible given a small enough backup that we end up pushing transactions that are arbitrarily up-to-date even with all of this added complexity. Don't we want a hard guarantee that we don't disrupt transactions below some threshold of staleness?
I wonder if there's any way to salvage an "always backup with high priority" approach. If so, I think that's worth exploring. For instance, what if we just ... waited until the backup timestamp was more than 10 seconds stale before sending the first Export request? For users using the best practice of running BACKUP with AS OF SYSTEM TIME, this would be free. For users not doing this, this would place a 10-second penalty on the latency of a backup. Unless we're optimizing for backup speed on very small tables, which I don't think we are, this penalty is likely completely negligible. And even then, follow the best practices as a user and you'll be fine.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @dt and @pbardea)
I don't particularly like slapping an blanket latency penalty on every non-backdated backup: in penalizes backups on tables that weren't seeing any traffic unnecessarily and it isn't even clear what the right amount would be -- 1s might be appropriate in some scenarios but 30s in others depending on the workload we're trying to accommodate. I'd much rather only penalize a backup that actually hits an intent. So to do that, we'd still need a way to signal "i hit an intent" back to the sender. This change attempts to use the fact we usually have other work to do that we can do in the meantime, instead of just sitting on our hand and doing nothing while we wait. As you point out, very small backups we may get back to the range quickly and thus run at high priority quite soon, aborting foreground traffic -- I should have included a TODO documenting our plan there: we're thinking that on subsequent retries waiting until back-date is something like Another thing that motivated this approach, and I also should have left a TODO about it, is that we're trying to address a constant frustration with many bulk operations which is "why is this taking so long?" is often hard to determine. Right now if we're stuck waiting on an intent, there's no indication of that -- we just see the request and overall BACKUP taking longer / not making progress. Informing the job that we hit an intent lets us send that information back to the coordinator, so we can then surface that status, e.g. something like "waiting on uncommitted transactions in 6 ranges". That in turn gives the operator a hint that maybe they need to back-date more, or maybe they need to look for some lost/stuck txns. I don't particularly like just slapping priority high on all backups without an imposed backdate either, since impact on foreground traffic is one of the top concerns we hear every time we talk to production operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r2.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @dt)
pkg/ccl/backupccl/backup_processor.go, line 42 at r2 (raw file):
"kv.bulk_io_read.backup_priority_attempt_limit", "number of reads BACKUP should attempt before using high priority", 3,
Alternatively (or maybe in addition) to having a backoff for every retry, I wonder if it would be useful to have a knob where the user could control how much time should pass before we start using high-priority transactions.
This would work similarly to defaulting to AOST -$cluster_setting seconds
, but would be enforced by default as a cluster setting. Also, instead of reading in the past it would read now, but wait the period of the cluster setting if it gets blocked. This may help control the tradeoff between "I don't want my backups affecting my foreground traffic" and "But if my backup is taking really long I can sacrifice some retries on already long transactions".
b576652
to
dc427c4
Compare
Okay, switched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 13 files at r3, 4 of 8 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @miretskiy, and @pbardea)
pkg/ccl/backupccl/backup_processor.go, line 187 at r4 (raw file):
// check to see if it is time to switch to priority. if !priority && span.attempts > 0 { // Check if this is starting a new pass and we should delay first.
want to expand the comment to explain why it's okay to wait (everything else is lower priority anyway)?
pkg/roachpb/api.proto, line 1433 at r4 (raw file):
int64 target_file_size = 10; // FailOnIntents causes the request to return a TransactionRetryError if it
s/TransactionRetryError/UnhandledRetryeError/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @dt, @miretskiy, and @pbardea)
32cd26f
to
19ead9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanics in the second commit now . Thanks for adding in the
read_with_priority_after
window.
The remaining question I have is whether we want to repurpose this error type.
Reviewed 2 of 13 files at r3, 8 of 8 files at r5, 8 of 8 files at r6.
Reviewable status:complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @dt)
pkg/ccl/backupccl/backup_processor.go, line 42 at r6 (raw file):
) priorityAfter = settings.RegisterNonNegativeDurationSetting( "bulkio.backup.read_with_priority_after",
s/priority/high_priority/
here and in the sentence below.
pkg/ccl/backupccl/backup_processor.go, line 49 at r6 (raw file):
"bulkio.backup.read_retry_delay", "amount of time since the read-as-of time, per-prior attempt, to wait before making another attempt", time.Second*5,
nit: 5*time.Second
pkg/ccl/backupccl/backup_processor.go, line 223 at r6 (raw file):
if pErr != nil { if err := pErr.Detail.GetUnhandledRetry(); err != nil { span.lastTried = timeutil.Now()
Do we want to use the HLC clock here and talk in terms of HLC timestamps?
pkg/ccl/storageccl/export.go, line 160 at r6 (raw file):
if err != nil { if err, ok := err.(*roachpb.WriteIntentError); ok && args.FailOnIntents { wrapped := &roachpb.UnhandledRetryableError{}
The fact that we're not wrapping an error is an indication that we don't want to repurpose this error type, but I'll still defer to @andreimatei on that.
pkg/roachpb/errors.proto, line 518 at r5 (raw file):
LeaseRejectedError lease_rejected = 13; NodeUnavailableError node_unavailable = 14; UnhandledRetryableError unhandled_retry = 40;
@andreimatei do you mind taking a look at this? My intuition is that we don't want to repurpose this error type and certainly don't want to add it to this union because its purpose is to wrap these errors in this union when converted into a Go error (see Error.GoError
). Adding it to the union forces us to ask the question of what an UnhandledRetryableError{Err: UnhandledRetryableError{...}}
means. I think you have a clearer picture of what the vision of this error type was though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @dt)
pkg/ccl/storageccl/export.go, line 160 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The fact that we're not wrapping an error is an indication that we don't want to repurpose this error type, but I'll still defer to @andreimatei on that.
we are wrapping an error, aren't we? I set PErr on the line below to the actual WriteIntentError.
pkg/roachpb/errors.proto, line 518 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
@andreimatei do you mind taking a look at this? My intuition is that we don't want to repurpose this error type and certainly don't want to add it to this union because its purpose is to wrap these errors in this union when converted into a Go error (see
Error.GoError
). Adding it to the union forces us to ask the question of what anUnhandledRetryableError{Err: UnhandledRetryableError{...}}
means. I think you have a clearer picture of what the vision of this error type was though.
I'm also eager to learn more about this history and motivation for this error type, but my brief spelunking made me think it was appropriate for this use -- As I understood it from the comments, it a type of error that callers should expect to see when they're returned a retryable error, wrapping some underlying error that explains why they're being asked to retry.
By my read, that is exactly what I want here -- my caller to see a retryable error, wrapping the underlying (WriteIntent) error. What I believe is different about my usage, and why I wanted to put it in the union, is that I wanted to give the server the choice to wrap it before even replying.
I don't expect distsender to wrap an error that is already UnhandledRetryableError
in another layer of UnhandledRetryableError
, so I don't think we're any more forced to consider what UnhandledRetryableError(UnhandledRetryableError(x))
means than we were before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just intents that exports need to worry about, or is it also unreplicated locks? Unless there's something special about ExportRequests
, those should block you just as much. If so, I think you want to expand the handling of the new flag and consider it when evaluation looks through the lock table (not sure where exactly, but I can dig or we can ask Nathan). And even intents, I think, can be discovered through the lock table, not only through scans, depending whether the Export
is the first guy to run into a specific intent, or if other requests ran into it recently. I'm sorry if I'm off.
Another thought I have is that I think it's pretty brutal how the patch currently hides the WriteIntentError
. Part of handling that error is putting the intent into the lock table. I think it'd be nice to structure the code such that that part still happens, even if the request then doesn't actually wait for the intent to be resolved. It's not a big deal if Nathan didn't feel the same, but I'm surprised he didn't. This dove-tails with my other comment about how maybe we don't want to wrap the WriteIntentError
in anything.
Reviewable status:
complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @dt, and @nvanbenschoten)
pkg/ccl/backupccl/backup_processor.go, line 218 at r6 (raw file):
req.FailOnIntents = true } log.Infof(ctx, "sending ExportRequest for span %s (attempt %d, priority %s)",
did you want to leave this at Info
level?
pkg/roachpb/api.proto, line 1433 at r6 (raw file):
int64 target_file_size = 10; // FailOnIntents causes the request to return a UnhandledRetryeError if it
UnhandledRetry*e*Error
pkg/roachpb/errors.proto, line 518 at r5 (raw file):
As I understood it from the comments, it a type of error that callers should expect to see when they're returned a retryable error, wrapping some underlying error that explains why they're being asked to retry.
Not really, depending on what "callers" we're talking about. Generally code under the TxnCoordSender
deals with specific retriable errors (e.g. TransactionRetryError
), and code above the TxnCoordSender
deals with TransactionRetryWithProtoRefreshError
(which corresponds to a "handled" retriable error - the name cracks me up every time).
I don't think UnhandledRetryableError
belongs here. UnhandledRetryableError
is used by DistSQL leaf transactions to communicate with the DistSQL root transaction (it is produced on kvclient side); it's not meant for communication between kvserver and kvclient. There's also this assertion here that verifies that generally the user of a transaction doesn't get this error.
Line 835 in 9547502
log.Fatalf(ctx, "unexpected UnhandledRetryableError at the txn.exec() level: %s", err) |
I also don't think that UnhandledRetryableError
should ever wrap a WriteIntentError
; a write intent is not retriable in the same sense as the errors that implement transactionRestartError
- those are the retriable ones.
If ExportRequest
needs to return some new error type, then I'd much rather we introduce a brand new error type in this enum. But I think a better idea might be to just return WriteIntentError
to the client. In fact, have you considered making fail_on_intents
a field on the RequestHeader
? It could apply in the same way to any request - it would inhibit the call to handleWriteIntentError()
in replica.executeBatchWithConcurrencyRetries()
. FWIW, if we were to do that, we'd change the character of WriteIntentError
a bit - I think currently this error is never returned from the server to the client, and is only defined as a proto because we had the bad idea to standardize this Sender
interface that returns pErr
and use it both on the client side and on the server side. I'm suggesting we start returning WriteIntentError
to the client - but that makes sense to me.
Great point, I completely missed this, but @andreimatei's correct. The lock table needs to be made aware of this flag as well. As is, this is only a partial solution. This is a pretty strong indication that we should generalize this concept. Specifically, I think it's time to consider adding a
cockroach/pkg/sql/sem/tree/select.go Lines 1030 to 1033 in c09aeaf
I'm a little less eager to support Anyway, @dt this would require a bit of KV work but then we could remove all of the |
@nvanbenschoten More than happy to wait (har har) to rebase this if you think it's feasible that it'll be in the KV API in time to get this in for 20.2. A generalized WaitPolicy sounds like exactly what we want, and is on the path, I believe, to some of our other desires as well (e.g. limit how long we wait trying to contact unavailable ranges or other places we currently see hangs). Thanks! |
Oh, I meant to say: I might suggest |
@dt I just pushed #52388, which should provide you with a better interface to build this PR on top of. You should be able to set a
I didn't end up going along this route. The nested enum approach is tricky to keep a) free in the common case and b) backwards compatible when expressed as protobufs. I'm also not convinced that it's needed. The idea of a dailtimeout is well out of scope for this work and I don't think it would be a good idea to conflate it with what we're talking about here. The |
Informs cockroachdb#40476. Informs cockroachdb#51624. This commit introduces the concept of request "wait policies". A WaitPolicy specifies the behavior of a request when it encounters conflicting locks held by other active transactions. The default behavior is to block until the conflicting lock is released, but other policies can make sense in special situations. Within this new formalization, the commit creates two initial wait policy variants: ``` // Block indicates that if a request encounters a conflicting locks held by // another active transaction, it should wait for the conflicting lock to be // released before proceeding. Block = 0; // Error indicates that if a request encounters a conflicting locks held by // another active transaction, it should raise an error instead of blocking. Error = 1; ``` `Block` is equivalent to the current behavior, so there's no further action needed for that variant. However, the `Error` policy is new, and this commit teaches the `lockTableWaiter` about it. It ensures that when a request with the `Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH PushRequest to determine whether the lock is abandoned or whether its holder is still active. If the holder is abandoned, the request cleans up the lock and proceeds. If the holder is still active, a WriteIntentError is returned to the client. This will unblock both of the referenced issues.
Informs cockroachdb#40476. Informs cockroachdb#51624. This commit introduces the concept of request "wait policies". A WaitPolicy specifies the behavior of a request when it encounters conflicting locks held by other active transactions. The default behavior is to block until the conflicting lock is released, but other policies can make sense in special situations. Within this new formalization, the commit creates two initial wait policy variants: ``` // Block indicates that if a request encounters a conflicting locks held by // another active transaction, it should wait for the conflicting lock to be // released before proceeding. Block = 0; // Error indicates that if a request encounters a conflicting locks held by // another active transaction, it should raise an error instead of blocking. Error = 1; ``` `Block` is equivalent to the current behavior, so there's no further action needed for that variant. However, the `Error` policy is new, and this commit teaches the `lockTableWaiter` about it. It ensures that when a request with the `Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH PushRequest to determine whether the lock is abandoned or whether its holder is still active. If the holder is abandoned, the request cleans up the lock and proceeds. If the holder is still active, a WriteIntentError is returned to the client. This will unblock both of the referenced issues.
Thanks for jumping on this @nvanbenschoten ! I've rebased this PR on yours and am happy to see it working exactly as expected. I'll rebase again after yours merges but otherwise I think this is all set for a final review from bulk-folks just looking at the second commit. @nvanbenschoten just to clarify, my API suggestion: I certainly agree anything beyond the error-vs-block are all out of scope for now -- i was just thinking that if we add more details later, like a dial timeout or an alternative block-up-to-limit versus just block-or-error, then it might be nice to have a message type wrapper to add them to. But I'm also completely happy with this as-is -- it was just a suggestion. |
Informs cockroachdb#40476. Informs cockroachdb#51624. This commit introduces the concept of request "wait policies". A WaitPolicy specifies the behavior of a request when it encounters conflicting locks held by other active transactions. The default behavior is to block until the conflicting lock is released, but other policies can make sense in special situations. Within this new formalization, the commit creates two initial wait policy variants: ``` // Block indicates that if a request encounters a conflicting locks held by // another active transaction, it should wait for the conflicting lock to be // released before proceeding. Block = 0; // Error indicates that if a request encounters a conflicting locks held by // another active transaction, it should raise an error instead of blocking. Error = 1; ``` `Block` is equivalent to the current behavior, so there's no further action needed for that variant. However, the `Error` policy is new, and this commit teaches the `lockTableWaiter` about it. It ensures that when a request with the `Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH PushRequest to determine whether the lock is abandoned or whether its holder is still active. If the holder is abandoned, the request cleans up the lock and proceeds. If the holder is still active, a WriteIntentError is returned to the client. This will unblock both of the referenced issues.
Informs cockroachdb#40476. Informs cockroachdb#51624. This commit introduces the concept of request "wait policies". A WaitPolicy specifies the behavior of a request when it encounters conflicting locks held by other active transactions. The default behavior is to block until the conflicting lock is released, but other policies can make sense in special situations. Within this new formalization, the commit creates two initial wait policy variants: ``` // Block indicates that if a request encounters a conflicting locks held by // another active transaction, it should wait for the conflicting lock to be // released before proceeding. Block = 0; // Error indicates that if a request encounters a conflicting locks held by // another active transaction, it should raise an error instead of blocking. Error = 1; ``` `Block` is equivalent to the current behavior, so there's no further action needed for that variant. However, the `Error` policy is new, and this commit teaches the `lockTableWaiter` about it. It ensures that when a request with the `Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH PushRequest to determine whether the lock is abandoned or whether its holder is still active. If the holder is abandoned, the request cleans up the lock and proceeds. If the holder is still active, a WriteIntentError is returned to the client. This will unblock both of the referenced issues.
52388: kv: add WaitPolicy option to BatchRequest r=nvanbenschoten a=nvanbenschoten Informs #40476. Informs #51624. This commit introduces the concept of request "wait policies". A WaitPolicy specifies the behavior of a request when it encounters conflicting locks held by other active transactions. The default behavior is to block until the conflicting lock is released, but other policies can make sense in special situations. Within this new formalization, the commit creates two initial wait policy variants: ``` // Block indicates that if a request encounters a conflicting locks held by // another active transaction, it should wait for the conflicting lock to be // released before proceeding. Block = 0; // Error indicates that if a request encounters a conflicting locks held by // another active transaction, it should raise an error instead of blocking. Error = 1; ``` `Block` is equivalent to the current behavior, so there's no further action needed for that variant. However, the `Error` policy is new, and this commit teaches the `lockTableWaiter` about it. It ensures that when a request with the `Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH PushRequest to determine whether the lock is abandoned or whether its holder is still active. If the holder is abandoned, the request cleans up the lock and proceeds. If the holder is still active, a WriteIntentError is returned to the client. This will unblock both of the referenced issues. Co-authored-by: Nathan VanBenschoten <[email protected]>
Just merged, so you should be all set! |
BACKUP, like any read, cannot export a range if it contains an unresolved intent. If we hit an intent during iteration, we typically return a WriteIntentError like any scan and let the store resolve it and retry the request before replying to the sender. However blocking intent resolution can mean a single txn left open can cause backups to hang, sometimes for hours or even days in some observed cases. While operators typically want bulk, background jobs like BACKUP to have as little impact on their foreground traffic as possible, even more importantly they want their BACKUPs to actually complete to maintain their RPO SLA. So while you typically would not want a BACKUP to immeidately run at high priority and abort any transactions from OLTP foreground traffic as soon as it starts, you do, eventually, want it to do whatever it needs to do to finish. Backdating BACKUPs with AS OF SYSTEM TIME '-30s' is a good best practice to reduce the number of still-running transactions they may encounter, but an abandoned or very long-running transaction can still hang such a backup. This change extends BACKUP to initially start asking individual ranges to backup, but bail out if they encounter an another transaction, rather than attempting to resolve it and holding up the whole BACKUP while doing so. If any range replies that it bailed out, it is put at the back of the queue to come back to later, and BACKUP continues to work on other ranges in the hopes that by the time it is done with them, the intent may have already resolved itself. When it comes back to a range that it has already attempted to backup, if the time since the backup's read-time is greater than a configurable limit, it instead sets the priority to high to mean any transaction encountered will be pushed so the backup can complete. Release note (enterprise change): BACKUP takes priority over other transactions if its initial attempts to export a range fail.
Thanks @nvanbenschoten, working like a charm! And I realized the bulk-reviewers already LGTM'ed so I guess this is now all good to go. bors r+ |
Build succeeded: |
Informs cockroachdb#40476. Informs cockroachdb#51624. This commit introduces the concept of request "wait policies". A WaitPolicy specifies the behavior of a request when it encounters conflicting locks held by other active transactions. The default behavior is to block until the conflicting lock is released, but other policies can make sense in special situations. Within this new formalization, the commit creates two initial wait policy variants: ``` // Block indicates that if a request encounters a conflicting locks held by // another active transaction, it should wait for the conflicting lock to be // released before proceeding. Block = 0; // Error indicates that if a request encounters a conflicting locks held by // another active transaction, it should raise an error instead of blocking. Error = 1; ``` `Block` is equivalent to the current behavior, so there's no further action needed for that variant. However, the `Error` policy is new, and this commit teaches the `lockTableWaiter` about it. It ensures that when a request with the `Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH PushRequest to determine whether the lock is abandoned or whether its holder is still active. If the holder is abandoned, the request cleans up the lock and proceeds. If the holder is still active, a WriteIntentError is returned to the client. This will unblock both of the referenced issues.
BACKUP, like any read, cannot export a range if it contains an unresolved intent.
If we hit an intent during iteration, we typically return a WriteIntentError like
any scan and let the store resolve it and retry the request before replying to the
sender.
However blocking intent resolution can mean a single txn left open can cause backups
to hang, sometimes for hours or even days in some observed cases. While operators
typically want bulk, background jobs like BACKUP to have as little impact on their
foreground traffic as possible, even more importantly they want their BACKUPs to
actually complete to maintain their RPO SLA. So while you typically would not want
a BACKUP to immeidately run at high priority and abort any transactions from OLTP
foreground traffic as soon as it starts, you do, eventually, want it to do whatever
it needs to do to finish.
Backdating BACKUPs with AS OF SYSTEM TIME '-30s' is a good best practice to reduce
the number of still-running transactions they may encounter, but an abandoned or very
long-running transaction can still hang such a backup. This change extends BACKUP to
initially start asking individual ranges to backup but bail out if they encounter an
intent, rather than attempting to resolve it and holding up the whole BACKUP while
doing so. If any range replies that it bailed out, it is put at the back of the queue
to come back to later, and BACKUP continues to work on other ranges, in the hopes that
by the time it is done with them, the intent may have already resolved itself.
When it comes back to a range that it has already attempted to backup up to the limit
(default 3), it instead asks the range to backup but not bail out if it hits an intent
and instead do the usual behavior of trying to resolve it. It addtionally sets the
priority to high, so that during that resolution, other transactions will be aborted
so that the backup can finish.
Release note (enterprise change): BACKUP takes priority over other transactions if its initial attempts to export a range fail.