Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethereum adapter needs better nonce management #776

Closed
benjamincburns opened this issue Mar 24, 2020 · 3 comments · Fixed by #829
Closed

Ethereum adapter needs better nonce management #776

benjamincburns opened this issue Mar 24, 2020 · 3 comments · Fixed by #829
Labels
enhancement New feature or request

Comments

@benjamincburns
Copy link
Contributor

benjamincburns commented Mar 24, 2020

Context

Ethereum transactions are required to contain a nonce. The nonce of a transaction is zero-based, and must be equal to the number of transactions previously submitted (and accepted into the mempool) by the account signing the transaction. If a transaction is submitted with an incorrect nonce, it is rejected before entering the mempool. This includes cases when transactions are submitted in rapid succession, but out of order.

Expected Behavior

  • Caliper will submit transactions in the correct order without nonce errors.
  • Other transaction errors will not trigger nonce errors.

Actual Behavior

  • Caliper sometimes submits transactions out of order. When this happens, all outstanding transactions fail due to incorrect nonces.
  • When transaction submission errors occur that cause the transaction to not enter the mempool, all following transactions for a given account fail.

Possible Fix

There are actually three issues here.

  1. Somehow async requests being made to the node's RPC interface are being reordered, causing them to arrive out of order.
  2. The ethereum adapter does not detect transaction submission failures and update the state of its internal nonce tracking accordingly.
  3. Many transactions are generated at once for the same account, meaning that even if nonce tracking were updated accordingly on transaction submission failures, many in-flight transactions will fail.

Steps to Reproduce

  1. Run a benchmark with a large transaction count against an ethereum client - it's most noticeable when running against ganache-cli, but it also occurs when benchmarking Besu.

Context

This bug is impacting users' ability to benchmark Besu and other compatible ethereum client implementations.

Your Environment

  • Version used: v0.3.0 and current master
  • Environment name and version (e.g. Chrome 39, node.js 5.4): various nodejs versions from v8.17.0 to v11.19.0.
  • Operating System and version (desktop or mobile): Ubuntu 19.10
@benjamincburns
Copy link
Contributor Author

Regarding out of order transactions, so far these are the solutions I've come up with. I'm not overly pleased with them however, so I'd appreciate suggestions for better alternatives.

  1. Use the async-mutex to lock a mutex when the nonce is assigned to the transaction parameters, and unlock it when the transaction hash is received.
  2. Switch to using web3 only for transaction encoding, and use node's lower-level HTTP apis to submit and monitor the transaction, applying the same mutex strategy from the previous solution, but unlock when the request is written to the client.
  3. Move the transaction submission logic to a native module.

Approach 1 is easiest. I've already implemented and tested this locally. It fixes the transaction ordering issue, but it adds significant transaction latency and significantly limits potential TPS, as we must delay until the transaction hash is read. For this reason I think it's not feasible.

Approach 2 is more difficult, but minimizes the problems from approach 1 without switching to a native solution. Unfortunately it complicates the process of nonce management in the face of other transaction submission failures, and would make it impossible for such errors to be handled without cascading into at least a few other transactions, depending on submission rate.

Approach 3 is likely to be even lower latency than approach 2, however it suffers from the same drawbacks as approach 2, while making the code harder to maintain, harder to build, and less portable.

For the moment I think that I prefer approach 2, however I'll keep trying to think of other solutions, and I'd appreciate other ideas from the community before we proceed.

@benjamincburns
Copy link
Contributor Author

I originally assumed that this was happening at the level of libuv or node I/O scheduling, but it may actually be an issue in web3.js. To that end, I went bug hunting in their issues list.

I've definitely seen web3/web3.js#3425 pop up when the RPC endpoint can't be reached, not sure if it's causing nonce ordering issues for us.

web3/web3.js#1846 also looks relevant, though this appears to be specific to cases where web3.js is managing the nonce.

benjamincburns added a commit to benjamincburns/caliper that referenced this issue Mar 31, 2020
Web3 has recently added features that lazily fetches missing information
when signing and sending transactions. In our case, this was the
chainId, and the current gasPrice. Because transactions are submitted
asynchronously, these additional requests would sometimes cause
transactions to be submitted out of order. Even if this probelm were
mitigated, these additional requests added spurious latency spikes to
some transactions, negatively impacting the test results.

This partially addresses hyperledger-caliper#776 by eliminating request reordering issues
for stable websocket connections. This change does not however address
transaction reordering for HTTP connections, or for websocket
connections that require reconnects. It also does not solve the
"christmas tree light" issue w.r.t. nonce errors.

Signed-off-by: Ben Burns <[email protected]>
benjamincburns added a commit to benjamincburns/caliper that referenced this issue Mar 31, 2020
Web3 has recently added features that lazily fetches missing information
when signing and sending transactions. In our case, this was the
chainId, and the current gasPrice. Because transactions are submitted
asynchronously, these additional requests would sometimes cause
transactions to be submitted out of order. Even if this probelm were
mitigated, these additional requests added spurious latency spikes to
some transactions, negatively impacting the test results.

This partially addresses hyperledger-caliper#776 by eliminating request reordering issues
for stable websocket connections. This change does not however address
transaction reordering for HTTP connections, or for websocket
connections that require reconnects. It also does not solve the
"christmas tree light" issue w.r.t. nonce errors.

Signed-off-by: Ben Burns <[email protected]>
benjamincburns added a commit to benjamincburns/caliper that referenced this issue Mar 31, 2020
Web3 has recently added features that lazily fetches missing information
when signing and sending transactions. In our case, this was the
chainId, and the current gasPrice. Because transactions are submitted
asynchronously, these additional requests would sometimes cause
transactions to be submitted out of order. Even if this probelm were
mitigated, these additional requests added spurious latency spikes to
some transactions, negatively impacting the test results.

This partially addresses hyperledger-caliper#776 by eliminating request reordering issues
for stable websocket connections. This change does not however address
transaction reordering for HTTP connections, or for websocket
connections that require reconnects. It also does not solve the
"christmas tree light" issue w.r.t. nonce errors.

Signed-off-by: Ben Burns <[email protected]>
@aklenik aklenik added the enhancement New feature or request label Apr 7, 2020
@benjamincburns
Copy link
Contributor Author

benjamincburns commented May 6, 2020

I've been testing the now-merged PRs that address this issue for about a month now, and things are working well. Those PRs however don't address the issue for RPC connections over HTTP.

Unfortunately, it seems the best course of action is to error when HTTP connections are used. My rationale here is threefold.

  1. web3.js have deprecated support for HTTP providers, and even if they don't remove them entirely, they've already made changes since deprecating them that degrade their overall stability.
  2. It's far too easy for transactions to be submitted out of order when using HTTP, as there's no synchronous, stateful connection management like we have with websockets.
  3. Any workarounds to the above two problems involve introducing locks which delay until a response is received from the transaction submission. This sort of locking effectively eliminates all request pipelining benefits, which in turn severely degrades the performance of the system.

I'll be submitting a PR shortly that introduces this error, and another to the docs pages explaining this rationale.

benjamincburns added a commit to benjamincburns/caliper that referenced this issue May 6, 2020
benjamincburns added a commit to benjamincburns/caliper that referenced this issue May 6, 2020
benjamincburns added a commit to benjamincburns/caliper that referenced this issue May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants