Skip to content
This repository has been archived by the owner on Mar 1, 2019. It is now read-only.

Wallet crashes immediately on POST externally-owned #353

Open
Anviking opened this issue Feb 20, 2019 · 6 comments
Open

Wallet crashes immediately on POST externally-owned #353

Anviking opened this issue Feb 20, 2019 · 6 comments
Labels
BUG Something isn't working

Comments

@Anviking
Copy link
Member

Anviking commented Feb 20, 2019

Release Operating System Cause
next Windows & OSX & Linux) Code v Configuration v Environment v Human v Unknown

Context

We were investigating strange occasional CI-failures in the EOS integration tests. The wallet would sometimes crash. The crashes didn't seem connected to any specific test, and seemed to have a very indirect cause.

After some time experimenting in the integration tests, I tried curling it myself.

Steps to Reproduce

  1. Launch the wallet.
stack exec -- cardano-wallet-server                               \
  --configuration-key       mainnet_dryrun_full                   \
  --configuration-file      ../cardano-sl/lib/configuration.yaml  \
  --db-path                 ./state-staging/db                    \
  --wallet-db-path          ./state-staging/wallet-db             \
  --tls-ca-cert             ./state-staging/tls/server/ca.crt     \
  --tls-node-client-cert    ./state-staging/tls/client/client.crt \
  --tls-node-client-key     ./state-staging/tls/client/client.key \
  --tls-wallet-server-cert  ./state-staging/tls/server/server.crt \
  --tls-wallet-server-key   ./state-staging/tls/server/server.key \
  --topology                ./topology-examples/testnet.yaml      \
  --system-start            (date +%s)                            \
  --node-id                 wallet-node                           \
  --node-api-address        127.0.0.1:8080                        \
  --wallet-api-address      127.0.0.1:8090                        \
  --wallet-doc-address      127.0.0.1:8100                        \
  --rebuild-db                                                    \
  --wallet-rebuild-db
  1. Create a new EOS wallet with the JSON data provided by the docs (I used a JSON minify website to remove newlines)
curl -X POST https://localhost:8090/api/v1/wallets/externally-owned \
  -H "Accept: application/json; charset=utf-8" \
  -H "Content-Type: application/json; charset=utf-8" \
  --cert ./state-staging/tls/client/client.pem  \
  --cacert ./state-staging/tls/client/ca.crt \
  --data '{   "accounts": [     {       "publicKey": "1OQQ6jrO8xzPyybgLEk5vuUcoCCH4fds3k5rqnxErglRb7EiGQKa74TP9jx0ATHCqUiD8uLO6pP8z31+c393lw==",       "index": 2710723942     },     {       "publicKey": "bbaXliRAMq5OoT7ftKYqOLMNILvS2BoImWkdwm1dHtqQYOqgaA5tfIbagl5zG+W91cSseMaeXfPomCZ4zwRDeQ==",       "index": 3524158882     }   ],   "assuranceLevel": "normal",   "addressPoolGap": 66,   "name": "My EOS-Wallet" }'

Post 76f1bd8 (renamed endpoint) we need to use:

curl -X POST https://localhost:8090/api/v1/externally-owned-wallets \
  -H "Accept: application/json; charset=utf-8" \
  -H "Content-Type: application/json; charset=utf-8" \
  --cert ./state-staging/tls/client/client.pem  \
  --cacert ./state-staging/tls/client/ca.crt \
  --data '{"accounts":[{"publicKey":"1OQQ6jrO8xzPyybgLEk5vuUcoCCH4fds3k5rqnxErglRb7EiGQKa74TP9jx0ATHCqUiD8uLO6pP8z31+c393lw==","index":2710723942},{"publicKey":"bbaXliRAMq5OoT7ftKYqOLMNILvS2BoImWkdwm1dHtqQYOqgaA5tfIbagl5zG+W91cSseMaeXfPomCZ4zwRDeQ==","index":3524158882}],"assuranceLevel":"normal","addressPoolGap":66,"name":"My EOS-Wallet"}'

Expected behavior

  1. The wallet must not crash
  2. The JSON sample in the docs should be correct.

Actual behavior

Wallet crashes within about 0 to 2 seconds.

[cardano-sl.*production*:Info:ThreadId 1749] [2019-02-20 20:56:51.59 UTC] Verifying and applying blocks...
[cardano-sl.*production*:Debug:ThreadId 1749] [2019-02-20 20:56:51.59 UTC] Rolling: verifying
[cardano-sl.*production*:Info:ThreadId 1749] [2019-02-20 20:56:51.59 UTC] slogVerifyBlocks: Consensus era is Original
[cardano-sl.node:Error:ThreadId 1635] [2019-02-20 20:56:51.70 UTC] Root key missing for [Q5GVQ9zELsWNihPADkaV9fD7sDubRa2BpzMrgGYNrQstFjtSJ]
[cardano-sl.node:Debug:ThreadId 1635] [2019-02-20 20:56:51.70 UTC] Stopping wallet worker.
[cardano-sl.diffusion.outboundqueue.wallet-node:Warning:ThreadId 1743] [2019-02-20 20:56:51.70 UTC] sending MsgRequestBlocks (fromList [NodeId 52.68.30.216:3000:0]) to NodeId 52.68.30.216:3000:0 failed with AsyncCancelled :: SomeException
[cardano-sl.node:Error:ThreadId 1670] [2019-02-20 20:56:51.70 UTC] stopping the wallet submission layer...
[cardano-sl.diffusion:Error:ThreadId 1638] [2019-02-20 20:56:51.70 UTC] stopping with exception AsyncCancelled
[cardano-sl.node:Error:ThreadId 4] [2019-02-20 20:56:51.70 UTC] logException: ErrNotEnoughAddresses
CallStack (from HasCallStack):
  error, called at src/Universum/Debug.hs:60:11 in universum-1.2.0-2E8EL6k5Fv214PFWOWwkIj:Universum.Debug
  error, called at src/Cardano/Wallet/Kernel/Read.hs:142:36 in cardano-wallet-2.0.0-FUPVlR0zztTET9K8hbLby1:Cardano.Wallet.Kernel.Read
cardano-wallet-server: ErrNotEnoughAddresses
CallStack (from HasCallStack):
  error, called at src/Universum/Debug.hs:60:11 in universum-1.2.0-2E8EL6k5Fv214PFWOWwkIj:Universum.Debug
  error, called at src/Cardano/Wallet/Kernel/Read.hs:142:36 in cardano-wallet-2.0.0-FUPVlR0zztTET9K8hbLby1:Cardano.Wallet.Kernel.Read

Resolution Plan

PR

Number Base
#? develop

QA

@Anviking Anviking added the BUG Something isn't working label Feb 20, 2019
@piotr-iohk
Copy link
Contributor

From what I see after you start a node again after this initial POST that crashed it, then it will crash itself again without any action after few seconds. It seems that this one single POST breakes the whole node permamently.
Also if you manage to do another POST before it crashes then this log entry has more addresses. One addres per each POST that was made...

[cardano-sl.node:Error:ThreadId 329] [2019-02-21 07:35:23.77 UTC] Root key missing for [NfU64bTcXuTK3ncVMn1FUwLjYYF5sXYbjo8JzcUzBf2J9AxSv, jtXt64VRC3vWMbrz6qFHkKDX11VyJhdJ7TDmhyeCLQqVEyEX8, jteoY5pyULxgEBsWVSo4KKQb6pZNSffU78zjtW7kkVyeWowGT, kgw1yXLubiksxZ3LsNbdK2DozKDJu2YUL5fnZWsM4HtbZFJ8y, m8yXXN7NGt6gEqmMDhBbbQcGzwyDJ5AcrFaHyWqoZNTnBZGnj, mf8K2jTHPwwCrVBQ3pteZiVTiRjtetCAQxzgjYLwhUJtEoY2k]

@denisshevchenko
Copy link
Contributor

denisshevchenko commented Feb 21, 2019

This bug occur chaotically: sometimes 1 failed tests per 5 successful ones, sometimes 1 failed tests per 15 successful ones. @piotr-iohk said:

interesting as there are several (29 exactly) integration tests doing POST to EosWallets (they're tagged EOSWALLETS_CREATE, some of them even using parts of data from the docs, like EOSWALLETS_CREATE_03)... so far they were passing

In most cases last output in tests contains something like this:

  EOSWALLETS_UPDATE_04 - one has to provide assuranceLevel to be either 'normal' or 'strict'
    assuranceLevel = normal
    assuranceLevel = strict
    assuranceLevel = empty string
    assuranceLevel = 555 FAILED [1]
    assuranceLevel = 亜哀愛źiemniak悪握圧扱安 FAILED [2]
  EOSWALLETS_UPDATE_05 - one has to provide all required parameters
    no addressPoolGap FAILED [3]
    no assuranceLevel FAILED [4]
    no name FAILED [5]

or like this:

  EOSWALLETS_UPDATE_05 - one has to provide all required parameters
    no addressPoolGap
    no assuranceLevel
    no name FAILED [1]

Actual error is this:

       uncaught exception: IOException of type UserError
       user error (expected a successful response but got an error: ClientHttpError (ConnectionError "HttpExceptionRequest Request {\n  host                 = \"127.0.0.1\"\n  port                 = 8090\n  secure               = True\n  requestHeaders       = []\n  path                 = \"/api/internal/reset-wallet-state\"\n  queryString          = \"\"\n  method               = \"DELETE\"\n  proxy                = Nothing\n  rawBody              = False\n  redirectCount        = 10\n  responseTimeout      = ResponseTimeoutDefault\n  requestVersion       = HTTP/1.1\n}\n (InternalException (HostCannotConnect \"127.0.0.1\" [Network.Socket.connect: <socket: 126>: does not exist (Connection refused)]))"))

It means that something kills the node, and (some) next request after that cannot be handled, because node is already dead.

Last output in logs is:

...
[31m[cardano-sl.edge:Error:ThreadId 916][0m [2019-02-18 14:47:57.23 UTC] Root key missing for [mDZnjKMvjfxdd1fTQF5v9gRdg1SSyoqQDY77QXQJdHzFxPhDj]
[cardano-sl.edge:Debug:ThreadId 916] [2019-02-18 14:47:57.23 UTC] Stopping wallet worker.
[34m[cardano-sl.walletapi:Info:ThreadId 1647][0m [2019-02-18 14:47:57.23 UTC] 
[31mDELETE[0m [37mRequest #137[0m
    [37m:>[0m api
    [37m:>[0m internal
    [37m:>[0m reset-wallet-state
[31m[cardano-sl.edge:Error:ThreadId 961][0m [2019-02-18 14:47:57.23 UTC] stopping the wallet submission layer...
[31m[cardano-sl.diffusion:Error:ThreadId 919][0m [2019-02-18 14:47:57.23 UTC] stopping with exception AsyncCancelled

This is corresponding code (https://github.com/input-output-hk/cardano-wallet/blob/develop/src/Cardano/Wallet/Kernel.hs#L213-L214):

     (_walletLogMessage walletPassive) Error "stopping the wallet submission layer..."
     cancel submissionLayerTicker

@denisshevchenko
Copy link
Contributor

denisshevchenko commented Feb 21, 2019

@Anviking found the code that tells us about possible reason of this bug (https://github.com/input-output-hk/cardano-wallet/blob/develop/src/Cardano/Wallet/WalletLayer/Kernel/Internal.hs#L36-L42):

    modifyMVar_ (w ^. Kernel.walletSubmission) $ \_ -> do

        -- clear both dbs.
        update' (w ^. Kernel.wallets) $ ClearDB
        clearMetaDB (w ^. Kernel.walletMeta)
        -- clear submission state.
        return Submission.emptyWalletSubmission

If we comment out clearMetaDB line - bug doesn't occur anymore.

Function clearMetaDB calls another variant of clearMetaDB (https://github.com/input-output-hk/cardano-wallet/blob/develop/src/Cardano/Wallet/Kernel/DB/TxMeta.hs#L26) which deletes _mDbOutputs, _mDbInputs and _mDbMeta (https://github.com/input-output-hk/cardano-wallet/blob/develop/src/Cardano/Wallet/Kernel/DB/Sqlite.hs#L456-L461).

So it looks like that the reason of this bug is some incorrect clearing actions in DB.

@piotr-iohk
Copy link
Contributor

^ as @Anviking pointed:

Maybe there is not enough time for them to fail

I know think that this issue might be the culprit of what we actually observed on CI and locally in the integration tests...

@KtorZ
Copy link
Contributor

KtorZ commented Feb 21, 2019

Has anyone been able to reproduce this with executing only one test (repeatedly)? There might be a intricate inter-dependencies between tests (I better hope not), but that would already help to know this :/

@piotr-iohk
Copy link
Contributor

I was able to reproduce with the request from this issue:

curl -X POST https://localhost:8090/api/v1/wallets/externally-owned \
  -H "Accept: application/json; charset=utf-8" \
  -H "Content-Type: application/json; charset=utf-8" \
  --cert ./state-staging/tls/client/client.pem  \
  --cacert ./state-staging/tls/client/ca.crt \
  --data '{   "accounts": [     {       "publicKey": "1OQQ6jrO8xzPyybgLEk5vuUcoCCH4fds3k5rqnxErglRb7EiGQKa74TP9jx0ATHCqUiD8uLO6pP8z31+c393lw==",       "index": 2710723942     },     {       "publicKey": "bbaXliRAMq5OoT7ftKYqOLMNILvS2BoImWkdwm1dHtqQYOqgaA5tfIbagl5zG+W91cSseMaeXfPomCZ4zwRDeQ==",       "index": 3524158882     }   ],   "assuranceLevel": "normal",   "addressPoolGap": 66,   "name": "My EOS-Wallet" }'

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
BUG Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants