-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible race condition between Commit and CheckTx from "to-be-updated" mempool #1091
Comments
I'll let the pros chime in on the bug but sure, push a patch for |
this is as far as github allows me to go: v0.14.0...omisego:36714eb1997930171a75e77189580c6ca8978e88 |
I found out that the above fix didn't exactly remove the condition, just made it much less frequent, so I missed it in the first run. I think this is a proper move, where it is the |
Note we're calling FlushSync here https://github.com/tendermint/tendermint/blob/develop/state/execution.go#L147, so maybe we should just move this call up right after Lock |
if we call it after, we might receive a "fresh" transaction from `broadcast_tx_sync` before old transactions (which were not committed). Refs #1091 ``` Commit is called with a lock on the mempool, meaning no calls to CheckTx can start. However, since CheckTx is called async in the mempool connection, some CheckTx might have already "sailed", when the lock is released in the mempool and Commit proceeds. Then, that spurious CheckTx has not yet "begun" in the ABCI app (stuck in transport?). Instead, ABCI app manages to start to process the Commit. Next, the spurious, "sailed" CheckTx happens in the wrong place. ```
Bah. Seems right, thanks for catching this! To summarize the sequence of events:
The issue being that the tx was sent to the socket, but we didnt wait for the app to see it before we move forward with the commit. So then we commit, the app updates its mempool state, but then it sees a tx that should actually now be the last tx in the mempool, not the first! Neat. |
Merged to develop |
I'm not sure whether the cause is in Tendermint, but maybe someone can look into the explanation and say, whether the race condition is possible.
BUG REPORT (?)
Tendermint version
0.14.0
, from source 88f5f21Environment:
OS (e.g. from /etc/os-release):
Ubuntu 16.04
Install tools:
glide install
/go install
What happened:
During heavy load of transactions (sent via
broadcast_tx_sync
) on a single Tendermint node w/ ABCI (communication with the ABCI via tcp socket).After an
ABCI.Commit
the firstABCI.CheckTx
I get is one of the transactions that should be applied after the transactions from the mempool update (i.e. a "fresh" transaction frombroadcast_tx_sync
)What you expected to happen:
After an
ABCI.Commit
the firstABCI.CheckTx
I get is the first one that didn't get into the committed block, i.e. the first one from the mempool update (i.e. an old transaction).How to reproduce it (as minimally and precisely as possible):
Can't reproduce easily without the ABCI app, but can elaborate on what I think might be the cause.
Commit
is called with a lock on the mempool, meaning no calls toCheckTx
can start. However, sinceCheckTx
is called async in the mempool connection, someCheckTx
might have already "sailed", when the lock is released in the mempool andCommit
proceeds.Then, that spurious
CheckTx
has not yet "begun" in the ABCI app (stuck in transport?). Instead, ABCI app manages to start to process theCommit
. Next, the spurious, "sailed"CheckTx
happens in the wrong place.I have inserted a
FlushSync
call after themempool.Lock()
and just before aCommit
call in https://github.com/tendermint/tendermint/blob/v0.14.0/state/execution.go#L253 and the issue went away.Anything else do we need to know:
I can provide a patch against
v0.14.0
that fixes that (with theFlushSync
) but not a PR againstdevelop
since I can't work withv0.15.0
just yet.The text was updated successfully, but these errors were encountered: