-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: interleavedpartitioned failure: client already committed or rolled back #28796
Comments
Looking. Same error reported for loadgen/kv in #28554 too. |
I believe I've found the problem: #28554 (comment) Will send a fix. |
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Aug 20, 2018
When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes cockroachdb#28554 Fixes cockroachdb#28796 Release note: None
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Aug 21, 2018
When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes cockroachdb#28554 Fixes cockroachdb#28796 Release note: None
andreimatei
added a commit
to andreimatei/cockroach
that referenced
this issue
Aug 21, 2018
When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes cockroachdb#28554 Fixes cockroachdb#28796 Release note: None
craig bot
pushed a commit
that referenced
this issue
Aug 21, 2018
28872: kv: lie better about commits that are really rollbacks r=andreimatei a=andreimatei When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes #28554 Fixes #28796 Release note: None Co-authored-by: Andrei Matei <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Aug 21, 2018
28911: release-2.1: kv: lie better about commits that are really rollbacks r=andreimatei a=andreimatei cc @cockroachdb/release Backport #28872 When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes #28554 Fixes #28796 Release note: None Co-authored-by: Andrei Matei <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#28795 fixes up the test setup for
interleavedpartitioned
which seems to have been masking failures. Now when I run it I frequently see failures like:This fails the
central
workload which in turn fails the test.@andreimatei for triage.
The text was updated successfully, but these errors were encountered: