-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always throw errors on failure on critical connection in router executor #2215
Conversation
86d37ea
to
ce6dbe8
Compare
Codecov Report
@@ Coverage Diff @@
## master #2215 +/- ##
==========================================
+ Coverage 93.64% 93.71% +0.07%
==========================================
Files 103 102 -1
Lines 26385 26286 -99
==========================================
- Hits 24707 24634 -73
+ Misses 1678 1652 -26 |
Just FYI: This seems like a good candidate to add a test using the new failure testing framework. We have not checked in the failure testing framework, but, after we merge this PR, it should be easy to add a test via a new PR as #2212 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked different code paths which called StoreQueryResult()
and ConsumeQueryResult()
with failOnError = false
to see if raising errors in these cases can be problematic.
I couldn't find any logical problems, but I think this hurts the readability a bit, since reader expects no errors to be thrown when this flag is false, but errors can be thrown.
|
||
HandleRemoteTransactionConnectionError(connection, raiseErrors); | ||
HandleRemoteTransactionConnectionError(connection, raiseIfTransactionIsCritical); | ||
return false; | ||
} | ||
|
||
singleRowMode = PQsetSingleRowMode(connection->pgConn); | ||
if (singleRowMode == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For later: this should be an Assert()
. Based on the documentation, PQsetSingleRowMode()
should always return 1 if called immediately after PQsendQuery()
, otherwise it is a programming error and returns 0.
@@ -1581,7 +1585,8 @@ StoreQueryResult(CitusScanState *scanState, MultiConnection *connection, | |||
category = ERRCODE_TO_CATEGORY(ERRCODE_INTEGRITY_CONSTRAINT_VIOLATION); | |||
isConstraintViolation = SqlStateMatchesCategory(sqlStateString, category); | |||
|
|||
if (isConstraintViolation || failOnError) | |||
if (isConstraintViolation || failOnError || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failOnError
is misnamed here, since there are cases that this can be false
but we still fail. We should at least update the comments for StoreQueryResult()
and ConsumeQueryResult()
.
e095ea0
to
f5065b6
Compare
f5065b6
to
0bbe778
Compare
Note: I did some renaming based on Hadi's feedback, but only the first 3 commits should be backported. |
DESCRIPTION: Fixes a bug that could cause transactions to incorrectly proceed after failure
Fixes #2214
Citus currently doesn't throw an error when a failure occurs in the router executor on a connection that was marked as critical. This can cause transactions to incorrectly proceed after a failure on a connection that's supposed to complete a 2PC due to commands earlier in the transaction. The commit handler is not ready for that situation and panics.
This PR fixes it by checking whether the connection is critical when an error response is returned, or throwing an error immediately if a connection failure occurs.
As a side-effect, this also caused some error messages in queries with CTEs to become more meaningful.
I've also defensively moved the call to
CheckTransactionHealth
out ofCoordinatedRemoteTransactionsCommit
because that created a code path where we could throw an error in the commit handler.