-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sql: prioritize retryable errors in synchronizeParallelStmts
At the moment, parallel statement execution works by sending batches concurrently through a single `client.Txn`. This make the handling of retryable errors tricky because it's difficult to know when its safe to prepare the transaction state for a retry. Our approach to this is far from optimal, and relies on a mess of locking in both `client.Txn` and `TxnCoordSender`. This works well enough to prevent anything from seriously going wrong (#17197), but can result in some confounding error behavior when statements operate in the context of transaction epochs that they weren't expecting. The ideal situation would be for all statements with a handle to a txn to always work under the same txn epoch at a single point in time. Any retryable error seen by these statements would be propagated up through `client.Txn` without changing any state (and without yet being converted to a `HandledRetryableTxnError`), and only after the statements have all been synchronized would the retryable error be used to update the txn and prepare for the retry attempt. This would require a change like #22615. I've created a POC for this approach, but it is way to invasive to cherry-pick. So with our current state of things, we need to do a better job catching errors caused by concurrent retries. In the past we've tried to carefully determine which errors could be a symptom of a concurrent retry and ignore them. I now think this was a mistake, as this process of inferring which errors could be caused by a txn retry is fraught for failure. We now always return retryable errors from synchronizeParallelStmts when they exist. The reasoning for this is that if an error was a symptom of the txn retry, it will not be present during the next txn attempt. If it was not and instead was a legitimate query execution error, we expect to hit it again on the next txn attempt and the behavior will mirror that where the statement throwing the execution error was not even run before the parallel queue hit the retryable error. Release note: None
- Loading branch information
1 parent
34c7c69
commit 3692794
Showing
3 changed files
with
82 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters