Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A transaction may be committed and TiDB doesn't return an undetermined error #21355

Open
youjiali1995 opened this issue Nov 27, 2020 · 3 comments
Labels
severity/moderate sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.

Comments

@youjiali1995
Copy link
Contributor

youjiali1995 commented Nov 27, 2020

Bug Report

A region error doesn't mean a write request is failed:

Now TiDB sets undetermined errors only when encounters RPC errors. It's possible a transaction is committed and TiDB doesn't return an undetermined error. Fortunately, TiDB cleans up the primary lock first, so it doesn't break the atomicity of transaction.

@youjiali1995 youjiali1995 added type/bug The issue is confirmed as a bug. sig/transaction SIG:Transaction labels Nov 27, 2020
@youjiali1995
Copy link
Contributor Author

/cc @cfzjywxk @MyonKeminta @sticnarf @nrc

@youjiali1995
Copy link
Contributor Author

youjiali1995 commented Nov 27, 2020

When TiKV is gracefully shutdown, it may return StaleCommand to all proposed requests(gRPC server is shutdown asynchronously): https://github.com/tikv/tikv/blob/4ed382c21699596a84aa3342b27e3221c0741893/components/raftstore/src/store/fsm/apply.rs#L3006-L3010

We can shut down the gRPC server synchronously.

When a region is destroyed, it will return RegionNotFound to all proposed requests: https://github.com/tikv/tikv/blob/4ed382c21699596a84aa3342b27e3221c0741893/components/raftstore/src/store/fsm/apply.rs#L1132-L1144

I'm not clear under what circumstances this case will occur now. I think we should list all cases which may result in undetermined stats.

@sticnarf
Copy link
Contributor

The second case looks unlikely to happen. Transfer and region merge should not result in undetermined states before destroying the peer.
If a peer destroys itself after being orphan like tikv/tikv#9113, the problem can happen. But it shouldn't be a problem if we add timeout to callback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/moderate sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

3 participants