Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

hello-gwasm-runner gets stuck after finishing several subtasks #5165

Open
marmistrz opened this issue Apr 20, 2020 · 0 comments
Open

hello-gwasm-runner gets stuck after finishing several subtasks #5165

marmistrz opened this issue Apr 20, 2020 · 0 comments
Labels
0.23 release bug P3 Severity-Low/Effort-hard

Comments

@marmistrz
Copy link
Contributor

marmistrz commented Apr 20, 2020

Description

Golem Version: 0.22.2

Golem-Messages version (leave empty if unsure):

Electron version (if used): N/A

OS: Ubuntu 18.04

Branch (if launched from source): develop

Mainnet/Testnet: testnet

Description of the issue:

I setup a subnet of nodes on one machine, which doesn't have a public IP address, as described in this wiki page

I created 1 requestor and 2 provider nodes. Then I used gwasm-runner to launch the hello-gwasm-runner task as:

gwasm-runner --backend=Brass target/wasm32-unknown-emscripten/release/hello_world.wasm

Then the workload is stuck at 7/10 progress. Adding additional seems to do the trick, the task succeeded with 4 provider nodes.

Those provider nodes were running for about 24h, then I tried to run the task again and the job was stuck at 7/10 again. Adding a fresh provider node did the trick again.

Logs and any additional context

$ for i in 1 2 3 4 5; do golemcli -d /home/marcin/golem/datadir$i/ -p 6100$i tasks subtasks list 8cd3f0be-82ea-11ea-8a50-6110940bba10; done
┌────────┬────────────────────────────────────────┬─────────────┬────────────┐
│  node  │  subtask id                            │  status     │  progress  │
├────────┼────────────────────────────────────────┼─────────────┼────────────┤
│        │  9841079e-82ea-11ea-a6fa-6110940bba10  │  Finished   │  100.0 %   │
│        │  98416ab0-82ea-11ea-b784-6110940bba10  │  Finished   │  100.0 %   │
│        │  9841e176-82ea-11ea-8ebd-6110940bba10  │  Finished   │  100.0 %   │
│        │  98423ba2-82ea-11ea-9094-6110940bba10  │  Finished   │  100.0 %   │
│        │  98429dc0-82ea-11ea-aaf3-6110940bba10  │  Finished   │  100.0 %   │
│        │  98430dc6-82ea-11ea-9887-6110940bba10  │  Finished   │  100.0 %   │
│        │  984368ba-82ea-11ea-8c1b-6110940bba10  │  Finished   │  100.0 %   │
│        │  9843e07e-82ea-11ea-bd8e-6110940bba10  │  Finished   │  100.0 %   │
│        │  98443bb8-82ea-11ea-b341-6110940bba10  │  Finished   │  100.0 %   │
│        │  9844ace4-82ea-11ea-8f1c-6110940bba10  │  Finished   │  100.0 %   │
│        │  9845120a-82ea-11ea-8e03-6110940bba10  │  Finished   │  100.0 %   │
│        │  9845885a-82ea-11ea-a0af-6110940bba10  │  Finished   │  100.0 %   │
│        │  9845e6ee-82ea-11ea-b7eb-6110940bba10  │  Finished   │  100.0 %   │
│        │  98465612-82ea-11ea-82ba-6110940bba10  │  Finished   │  100.0 %   │
│        │  9846b002-82ea-11ea-8e3f-6110940bba10  │  Verifying  │  0.0 %     │
│        │  984721f8-82ea-11ea-885f-6110940bba10  │  Verifying  │  0.0 %     │
│        │  98477db6-82ea-11ea-ab6b-6110940bba10  │  Verifying  │  0.0 %     │
│        │  a2e4b938-82ea-11ea-8e9d-6110940bba10  │  Timeout    │  0.0 %     │
│        │  a2e69ba2-82ea-11ea-ba7f-6110940bba10  │  Timeout    │  0.0 %     │
│        │  a2e89106-82ea-11ea-916e-6110940bba10  │  Timeout    │  0.0 %     │
│        │  13ef7d74-82ec-11ea-9999-6110940bba10  │  Starting   │  0.0 %     │
│        │  13f159ee-82ec-11ea-9187-6110940bba10  │  Starting   │  0.0 %     │
│        │  13f3468c-82ec-11ea-9d21-6110940bba10  │  Starting   │  0.0 %     │
└────────┴────────────────────────────────────────┴─────────────┴────────────┘
No subtasks
No subtasks
No subtasks
No subtasks

Excerpt from the requestor logs:

b09..a07693ff'
INFO     [golem.task.taskrequestorstats      ] Received work offers. offer_count=500, task_id='7aca90ac-824d-11ea-b476-6110940bba10'
INFO     [apps.wasm                          ] Node 1acb9b09d2d097819b1031ad7e918a8708bd0b24943900e1d704d1c968f22d7a3e5bff1e9a8a769468b11db46d6783358dfd0468fd14d0df6ee6adcea07693ff has been blacklisted for this task
INFO     [golem.task.taskserver              ] provider 1acb9b09d2d097819b1031ad7e918a8708bd0b24943900e1d704d1c968f22d7a3e5bff1e9a8a769468b11db46d6783358dfd0468fd14d0df6ee6adcea07693ff is not allowed for this task at this moment (either waiting for results or previously failed)
INFO     [golem.task.tasksession             ] Received offer to compute. task_id='7aca90ac-824d-11ea-b476-6110940bba10', node='37c64f7a..d543ea5a'
INFO     [golem.task.taskrequestorstats      ] Received work offers. offer_count=501, task_id='7aca90ac-824d-11ea-b476-6110940bba10'
INFO     [apps.wasm                          ] Node 37c64f7acb93ce0cd899cd2dd5d8ab9619e794df8c0063b683ccce1c57ea8e7d906161f3fb32d1086d3da620a2eebaa8b2a677e0f6f82f93944b512dd543ea5a has been blacklisted for this task
INFO     [golem.task.taskserver              ] provider 37c64f7acb93ce0cd899cd2dd5d8ab9619e794df8c0063b683ccce1c57ea8e7d906161f3fb32d1086d3da620a2eebaa8b2a677e0f6f82f93944b512dd543ea5a is not allowed for this task at this moment (either waiting for results or previously failed)
INFO     [golem.task.tasksession             ] Received offer to compute. task_id='7aca90ac-824d-11ea-b476-6110940bba10', node='1acb9b09..a07693ff'
INFO     [golem.task.taskrequestorstats      ] Received work offers. offer_count=502, task_id='7aca90ac-824d-11ea-b476-6110940bba10'
INFO     [apps.wasm                          ] Node 1acb9b09d2d097819b1031ad7e918a8708bd0b24943900e1d704d1c968f22d7a3e5bff1e9a8a769468b11db46d6783358dfd0468fd14d0df6ee6adcea07693ff has been blacklisted for this task
INFO     [golem.task.taskserver              ] provider 1acb9b09d2d097819b1031ad7e918a8708bd0b24943900e1d704d1c968f22d7a3e5bff1e9a8a769468b11db46d6783358dfd0468fd14d0df6ee6adcea07693ff is not allowed for this task at this moment (either waiting for results or previously failed)
@marmistrz marmistrz added bug brass P3 Severity-Low/Effort-hard labels Apr 20, 2020
@badb badb removed the brass label Jun 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
0.23 release bug P3 Severity-Low/Effort-hard
Projects
None yet
Development

No branches or pull requests

3 participants