Skip to content
This repository has been archived by the owner on Apr 28, 2022. It is now read-only.

Solver restart causes to lose track of onging settlement #445

Closed
fleupold opened this issue Apr 1, 2021 · 4 comments
Closed

Solver restart causes to lose track of onging settlement #445

fleupold opened this issue Apr 1, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@fleupold
Copy link
Contributor

fleupold commented Apr 1, 2021

When the solver restarts while it is waiting for a settlement to be mined, the new process doesn't find the pending transaction and just re-issues a new transaction with the same nonce. This can lead to the following error in case the gas price has stayed flat or even decreased compared to the pending tx:

settlement: settlement transaction failed
Caused by:
0: method 'settle(address[],uint256[],(uint256,uint256,address,uint256,uint256,uint32,bytes32,uint256,uint256,uint256,bytes)[],(address,uint256,bytes)[][3])' failure: web3 error: RPC error: Error { code: ServerError(-32010), message: "Transaction gas price supplied is too low. There is another transaction with same nonce in the queue. Try increasing the gas price or incrementing the nonce.", data: None }

Maybe we can add logic to fetch the pending transaction with in the retry code and use its gas price as a starting point for the new transaction (+ min increase).

Tangentially related to gnosis/gp-transaction-retry#3

@fleupold fleupold added the bug Something isn't working label Apr 1, 2021
@vkgnosis
Copy link
Contributor

vkgnosis commented Apr 7, 2021

I don't think there is a reasonable way to recover pending transactions from the node, looking at the eth rpc api and taking load balancing between several nodes into account. To properly fix this we need to keep track of transactions we've sent out in the database. This is also related to being able to react to transactions timing out because the node failed to propagate them.
I envision a table that keeps track of transactions with hash, nonce, gas price. With that we can detect nodes forgetting about transactions and recover previous still pending transactions. Then with this knowledge we can cancel old transactions with noop or overwrite them with new ones.
This makes the retry component more complicated because it is just responsible for one transaction but also has to keep track of state for multiple.

@fleupold
Copy link
Contributor Author

fleupold commented Apr 7, 2021

Does this really require persistence? The approach for finding the pending tx could be to get eth_getBlockByNumber("pending", true) and check in the list of txs if there is one of our transactions. This would only have to happen once at cold start (otherwise we can keep track in memory).

I agree that this is not guaranteed to work if we are behind a load balancer but in this case the node that doesn't think we have a pending tx should also accept our new transaction (so eventually we should either find the existing one or succeed at sending our new one).

I'm worried that adding persistence to the driver (and every other system that wants to use the retry component) is quite a big hammer and I'm not sure we tried more low hanging fruits yet.

@vkgnosis
Copy link
Contributor

vkgnosis commented Apr 8, 2021

in this case the node that doesn't think we have a pending tx should also accept our new transaction

The problem then is that the transaction that will likely get mined is the one in the mempool with higher gas price, that our node doesn't know about. So we see the new transaction as accepted but it would likely fail with nonce already used. Even in the case where we detect the existing transaction and use a higher gas price that can still happen which is something the retry code would have to be made aware of.
I am ok with assuming that it is very unlikely that the loadbalanced node doesn't know about the transaction after restart. It means we the fix won't be perfect but might be good enough.
Still not sure about whether we can reliably get the pending transaction. As we talked about in the retry issue I don't see this behavior documented and we are supposed to work with any node, not just openethereum. Also specifically for getBlockByNumber it's unclear whether that is guaranteed to have our pending transaction because a pending block could for example still have a gas limit (i don't know).

Another idea I had is running our own p2p mempool member. That way we have control over the source of this information and don't have to rely on the node. We could ensure that we always remember all transactions from accounts we care about and there is no danger of the node dropping it for us. We could even do the broadcasting of our new transactions ourself. This is probably not feasible because it adds too much complexity.

@fleupold
Copy link
Contributor Author

fleupold commented Apr 9, 2021

Another idea I had is running our own p2p mempool member.

This is definitely a direction that @koeppelmann wants to go in the long term (either bringing parts of the client into the driver or moving the driver directly into the OE client). However I feel like this is a very large project in itself and for now I'd rather sacrifice our client agnosticism (e.g. assuming we connect to an OE with an unlimited pending block) than building eth p2p components into the core driver (before the product idea is validated).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants