Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: interleavedpartitioned failure: client already committed or rolled back #28796

Closed
petermattis opened this issue Aug 18, 2018 · 2 comments
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model.
Milestone

Comments

@petermattis
Copy link
Collaborator

#28795 fixes up the test setup for interleavedpartitioned which seems to have been masking failures. Now when I run it I frequently see failures like:

central: Error: pq: TransactionStatusError: client already committed or rolled back the transaction (REASON_UNKNOWN)
central: Error:  Process exited with status 1

This fails the central workload which in turn fails the test.

@andreimatei for triage.

@petermattis petermattis added the A-kv-transactions Relating to MVCC and the transactional model. label Aug 18, 2018
@petermattis petermattis added this to the 2.1 milestone Aug 18, 2018
@andreimatei
Copy link
Contributor

Looking. Same error reported for loadgen/kv in #28554 too.

@andreimatei
Copy link
Contributor

I believe I've found the problem: #28554 (comment)

Will send a fix.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Aug 20, 2018
When a client tries to commit a txn that has performed writes at an old
epoch but has only done reads at the current epoch, one of the
TxnCoordSender interceptors turns the commit into a rollback (for
reasons described in the code).
This patch completes that interceptor's lie by updating the txn status
upon success to COMMITTED instead of ABORTED. Since a commit is what the
client asked for, it seems sane to pretend as best we can that that's
what it got. In particular, this is important for the sql module, where
the ConnExecutor looks at the txn proto's status to discriminate between
cases where a "1pc planNode" already committed an implicit txn versus
situations where it needs to commit it itself. This was causing the
executor to think the txn was not committed and to attempt to commit
again, which resulted in an error.
I don't know if we like the ConnExecutor looking at the proto status,
but I'll leave that alone.

Fixes cockroachdb#28554
Fixes cockroachdb#28796

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Aug 21, 2018
When a client tries to commit a txn that has performed writes at an old
epoch but has only done reads at the current epoch, one of the
TxnCoordSender interceptors turns the commit into a rollback (for
reasons described in the code).
This patch completes that interceptor's lie by updating the txn status
upon success to COMMITTED instead of ABORTED. Since a commit is what the
client asked for, it seems sane to pretend as best we can that that's
what it got. In particular, this is important for the sql module, where
the ConnExecutor looks at the txn proto's status to discriminate between
cases where a "1pc planNode" already committed an implicit txn versus
situations where it needs to commit it itself. This was causing the
executor to think the txn was not committed and to attempt to commit
again, which resulted in an error.
I don't know if we like the ConnExecutor looking at the proto status,
but I'll leave that alone.

Fixes cockroachdb#28554
Fixes cockroachdb#28796

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Aug 21, 2018
When a client tries to commit a txn that has performed writes at an old
epoch but has only done reads at the current epoch, one of the
TxnCoordSender interceptors turns the commit into a rollback (for
reasons described in the code).
This patch completes that interceptor's lie by updating the txn status
upon success to COMMITTED instead of ABORTED. Since a commit is what the
client asked for, it seems sane to pretend as best we can that that's
what it got. In particular, this is important for the sql module, where
the ConnExecutor looks at the txn proto's status to discriminate between
cases where a "1pc planNode" already committed an implicit txn versus
situations where it needs to commit it itself. This was causing the
executor to think the txn was not committed and to attempt to commit
again, which resulted in an error.
I don't know if we like the ConnExecutor looking at the proto status,
but I'll leave that alone.

Fixes cockroachdb#28554
Fixes cockroachdb#28796

Release note: None
craig bot pushed a commit that referenced this issue Aug 21, 2018
28872: kv: lie better about commits that are really rollbacks r=andreimatei a=andreimatei

When a client tries to commit a txn that has performed writes at an old
epoch but has only done reads at the current epoch, one of the
TxnCoordSender interceptors turns the commit into a rollback (for
reasons described in the code).
This patch completes that interceptor's lie by updating the txn status
upon success to COMMITTED instead of ABORTED. Since a commit is what the
client asked for, it seems sane to pretend as best we can that that's
what it got. In particular, this is important for the sql module, where
the ConnExecutor looks at the txn proto's status to discriminate between
cases where a "1pc planNode" already committed an implicit txn versus
situations where it needs to commit it itself. This was causing the
executor to think the txn was not committed and to attempt to commit
again, which resulted in an error.
I don't know if we like the ConnExecutor looking at the proto status,
but I'll leave that alone.

Fixes #28554
Fixes #28796

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
@craig craig bot closed this as completed in #28872 Aug 21, 2018
craig bot pushed a commit that referenced this issue Aug 21, 2018
28911: release-2.1: kv: lie better about commits that are really rollbacks r=andreimatei a=andreimatei

cc @cockroachdb/release 

Backport #28872

When a client tries to commit a txn that has performed writes at an old
epoch but has only done reads at the current epoch, one of the
TxnCoordSender interceptors turns the commit into a rollback (for
reasons described in the code).
This patch completes that interceptor's lie by updating the txn status
upon success to COMMITTED instead of ABORTED. Since a commit is what the
client asked for, it seems sane to pretend as best we can that that's
what it got. In particular, this is important for the sql module, where
the ConnExecutor looks at the txn proto's status to discriminate between
cases where a "1pc planNode" already committed an implicit txn versus
situations where it needs to commit it itself. This was causing the
executor to think the txn was not committed and to attempt to commit
again, which resulted in an error.
I don't know if we like the ConnExecutor looking at the proto status,
but I'll leave that alone.

Fixes #28554
Fixes #28796

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model.
Projects
None yet
Development

No branches or pull requests

2 participants