roachtest: interleavedpartitioned failure: client already committed or rolled back #28796

petermattis · 2018-08-18T19:32:13Z

#28795 fixes up the test setup for interleavedpartitioned which seems to have been masking failures. Now when I run it I frequently see failures like:

central: Error: pq: TransactionStatusError: client already committed or rolled back the transaction (REASON_UNKNOWN)
central: Error:  Process exited with status 1

This fails the central workload which in turn fails the test.

@andreimatei for triage.

The text was updated successfully, but these errors were encountered:

andreimatei · 2018-08-20T17:17:04Z

Looking. Same error reported for loadgen/kv in #28554 too.

andreimatei · 2018-08-20T18:48:59Z

I believe I've found the problem: #28554 (comment)

Will send a fix.

When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes cockroachdb#28554 Fixes cockroachdb#28796 Release note: None

28872: kv: lie better about commits that are really rollbacks r=andreimatei a=andreimatei When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes #28554 Fixes #28796 Release note: None Co-authored-by: Andrei Matei <[email protected]>

28911: release-2.1: kv: lie better about commits that are really rollbacks r=andreimatei a=andreimatei cc @cockroachdb/release Backport #28872 When a client tries to commit a txn that has performed writes at an old epoch but has only done reads at the current epoch, one of the TxnCoordSender interceptors turns the commit into a rollback (for reasons described in the code). This patch completes that interceptor's lie by updating the txn status upon success to COMMITTED instead of ABORTED. Since a commit is what the client asked for, it seems sane to pretend as best we can that that's what it got. In particular, this is important for the sql module, where the ConnExecutor looks at the txn proto's status to discriminate between cases where a "1pc planNode" already committed an implicit txn versus situations where it needs to commit it itself. This was causing the executor to think the txn was not committed and to attempt to commit again, which resulted in an error. I don't know if we like the ConnExecutor looking at the proto status, but I'll leave that alone. Fixes #28554 Fixes #28796 Release note: None Co-authored-by: Andrei Matei <[email protected]>

petermattis added the A-kv-transactions Relating to MVCC and the transactional model. label Aug 18, 2018

petermattis added this to the 2.1 milestone Aug 18, 2018

petermattis assigned andreimatei Aug 18, 2018

andreimatei mentioned this issue Aug 20, 2018

kv: lie better about commits that are really rollbacks #28872

Merged

andreimatei mentioned this issue Aug 21, 2018

release-2.1: kv: lie better about commits that are really rollbacks #28911

Merged

craig bot closed this as completed in #28872 Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: interleavedpartitioned failure: client already committed or rolled back #28796

roachtest: interleavedpartitioned failure: client already committed or rolled back #28796

petermattis commented Aug 18, 2018

andreimatei commented Aug 20, 2018

andreimatei commented Aug 20, 2018

roachtest: interleavedpartitioned failure: client already committed or rolled back #28796

roachtest: interleavedpartitioned failure: client already committed or rolled back #28796

Comments

petermattis commented Aug 18, 2018

andreimatei commented Aug 20, 2018

andreimatei commented Aug 20, 2018