Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looks like broadcast_tx_commit sometimes returns nonce error erroneously #3054

Closed
vgrichina opened this issue Jul 29, 2020 · 12 comments · Fixed by #3061
Closed

Looks like broadcast_tx_commit sometimes returns nonce error erroneously #3054

vgrichina opened this issue Jul 29, 2020 · 12 comments · Fixed by #3061
Assignees
Labels
A-RPC Area: rpc C-bug Category: This is a bug P-critical Priority: critical

Comments

@vgrichina
Copy link
Collaborator

vgrichina commented Jul 29, 2020

Describe the bug

Getting following error:

AccountAlreadyExists [Error]: Can't create a new account listens-counter-test-user-1595989309496-47, because it already exists
    at Object.parseRpcError (/Users/vg/Documents/nearlib/lib/utils/rpc_errors.js:38:19)
    at Account.signAndSendTransaction (/Users/vg/Documents/nearlib/lib/account.js:140:36)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async /Users/vg/Documents/clients/audius/listens-counter/src/benchmark.js:46:17
    at async Promise.all (index 7)
    at async runBenchmark (/Users/vg/Documents/clients/audius/listens-counter/src/benchmark.js:52:38) {
  type: 'AccountAlreadyExists',
  context: undefined,
  index: 0,
  account_id: 'listens-counter-test-user-1595989309496-47'
}

For code which shouldn't ever try to create same account twice unless there is nonce error:
https://github.com/near-examples/listens-counter/blob/master/src/benchmark.js

To Reproduce

  1. Make sure you use near-api-js with retry-nonce branch.
  2. Compile the contract: npm run build:contract
  3. Run node src/benchmark.js 80 50, sometimes it just happens.

Expected behavior

/Transaction nonce \d+ must be larger than nonce of the used access key \d+/ errors should only be returned when transaction does indeed need to be re-tried with a different nonce.

Right now it seems like there is some race condition and original transaction (for which error is returned) still lands:
https://explorer.testnet.near.org/accounts/listens-counter-test-user-1595989309496-47

Version:

  • testnet

Additional context
Add any other context about the problem here.

@vgrichina vgrichina added C-bug Category: This is a bug P-high Priority: High labels Jul 29, 2020
@vgrichina
Copy link
Collaborator Author

Looks like can fail on CI as well:
https://travis-ci.com/github/near/near-api-js/jobs/365987896

@evgenykuzyakov
Copy link
Collaborator

There is global retry as well, so it's possible that the near-api-js retries the transaction for some other error, not necessary nonce related. E.g. the first one worked and the local nonce was increased, but then the TX was lost by RPC even though it was forwarded, then it retries it again with the increased nonce and this one fails with conflicting account.

https://github.com/near/near-api-js/blob/4963d922c6f7783129bedfac0c232c4cb1606e42/src/account.ts#L117

@bowenwang1996
Copy link
Collaborator

@vgrichina what @evgenykuzyakov described is certainly a possibility.

@vgrichina
Copy link
Collaborator Author

@vgrichina
Copy link
Collaborator Author

I've did another run and it looks like following is going on:

Retrying transaction listens-counter-test-user-1596059687803-89:GigziPi9hgkj6s38MoLSyUa33FuU22M8PZDkGBbYLpgw as it has timed out

transaction takes a lot of time, so gets resubmitted because of timeout

Retrying transaction listens-counter-test-user-1596059687803-89:GigziPi9hgkj6s38MoLSyUa33FuU22M8PZDkGBbYLpgw with new nonce.

nonce error gets returned erroneously, this log is only really possible for nonce error. Looks like race condition in determining whether transaction is the same that landed with given nonce? @bowenwang1996

Failure [listens-counter-test-user-1596059687803-89]: Error: Can't create a new account listens-counter-test-user-1596059687803-89, because it already exists
AccountAlreadyExists [Error]: Can't create a new account listens-counter-test-user-1596059687803-89, because it already exists

on retry with new nonce we get this

@vgrichina
Copy link
Collaborator Author

Looks like this happens for other transactions types as well, like adding a key:
https://travis-ci.com/github/near/near-api-js/jobs/366096520

@maxzaver maxzaver assigned frol and unassigned evgenykuzyakov and maxzaver Jul 29, 2020
@maxzaver maxzaver added the A-RPC Area: rpc label Jul 29, 2020
@maxzaver
Copy link
Contributor

I have noticed a very similar error recently in the bridge Near-One/rainbow-bridge#229. It was not seen before, however I have attributed it to some bug in bridge or near-api-js. @vgrichina , @frol , @bowenwang1996 is it possible that we broke something in the nearcore or near-api-js ?

CC @ailisp

@bowenwang1996
Copy link
Collaborator

@vgrichina are you submitting transactions through the rpc endpoint or your own node?

@vgrichina
Copy link
Collaborator Author

@bowenwang1996 I never use my own node.

@vgrichina
Copy link
Collaborator Author

@nearmax in my testing I for sure do use different near-api-js (retry-nonce branch), however main change which seems to expose error is that transactions are retried when there is error with nonce collision. We didn't do it before in near-api-js.

But also my understanding is that before recent change from @bowenwang1996 in nearcore re-submitting same transaction would just always fail with wrong nonce error. Seems like now it sometimes fails with wrong nonce error.

@frol
Copy link
Collaborator

frol commented Jul 30, 2020

It seems that the following happens:

  1. a transaction gets sent (via broadcast_tx_commit)
  2. it goes through the RPC -> mempool -> network
  3. the RPC times out waiting to observe the transaction to be applied
  4. near-api-js re-sends the transaction with the same nonce
  5. RPC checks if the transaction with such a hash already included in the chain (it seems we catch race condition here), and with the slow network / long queue of transactions it might not be included yet, so RPC proceeds and tries to send this transaction again, but we happen to receive the new block where the transaction has been included and the nonce get updated, so we hit this "incorrect nonce" error. To resolve this issue, we should internally handle the incorrect nonce error and check the transaction status again instead of returning the error immediately. I can send a patch, though I don't know how to write a test for this issue (was mocknet designed to deal with such cases).

@vgrichina vgrichina added P-critical Priority: critical and removed P-high Priority: High labels Jul 30, 2020
@vgrichina
Copy link
Collaborator Author

@nearmax @bowenwang1996 @frol I think this is actually a P0 as can easily result in double spends. Let's fix it for next release ideally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-RPC Area: rpc C-bug Category: This is a bug P-critical Priority: critical
Projects
None yet
5 participants