avocado-runner-* crashes when status server socket closes #5794

clebergnu · 2023-11-09T12:57:13Z

Describe the bug
All runners built on top of avocado.core.nrunner.app.BaseRunnerApp will use TaskStatusService.post() to send status messages to the status server. If the socket get closed, there will be an EPIPE returned during send(), followed by a SIGPIPE, followed by a BrokenPipeError that is not handled, which results in a crash.

Steps to reproduce

nc -l -U /tmp/socket &
avocado-runner-rogue task-run -i ID -u x-avocado-runner-rogue -s /tmp/socket &
kill %1

Expected behavior
The runners should be more resilient when the status server is not operating properly. Ignoring failures is not a good option, so maybe runners will need a way to queue and persist messages they can't initially send to status servers.

But known crashes should be prevented, and unknown crashes (with information such as backtraces) should be captured.

Current behavior

avocado-runner-rogue task-run -i ID -u x-avocado-runner-rogue -s /tmp/socket &
[1] 747815
{'status': 'started', 'time': 142105.739663621, 'output_dir': '/tmp/.avocado-task-djf9hljx', 'id': 'ID'}
{'status': 'running', 'time': 142105.739762377, 'id': 'ID'}
...
{'status': 'running', 'time': 142113.250716832, 'id': 'ID'}
Traceback (most recent call last):
  File "/home/cleber/.local/bin/avocado-runner-rogue", line 33, in <module>
    sys.exit(load_entry_point('avocado-rogue', 'console_scripts', 'avocado-runner-rogue')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cleber/src/avocado/avocado/examples/plugins/tests/rogue/avocado_rogue/runner.py", line 55, in main
    app.run()
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/app.py", line 179, in run
    return kallable(args)
           ^^^^^^^^^^^^^^
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/app.py", line 293, in command_task_run
    for status in task.run():
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/task.py", line 213, in run
    status_service.post(status)
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/task.py", line 65, in post
    self.connection.send(data.encode("ascii") + "\n".encode("ascii"))
BrokenPipeError: [Errno 32] Broken pipe

System information (please complete the following information):

OS: LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: Fedora Description: Fedora release 38 (Thirty Eight) Release: 38 Codename: ThirtyEight
Avocado version: a038820
Avocado installation method: Git

Additional information
This was found under #5788

The text was updated successfully, but these errors were encountered:

This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]>

This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]> Signed-off-by: Cleber Rosa <[email protected]>

richtja · 2024-02-02T13:14:22Z

Solved in #5860

This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]> Signed-off-by: Cleber Rosa <[email protected]>

clebergnu added the bug label Nov 9, 2023

clebergnu added this to the #104 - Codename TBD milestone Nov 9, 2023

clebergnu added this to Default project Nov 9, 2023

mr-avocado bot moved this to Triage in Default project Nov 9, 2023

clebergnu mentioned this issue Nov 9, 2023

Spawner: make terminate_task return success or failure #5788

Merged

richtja moved this from Triage to Short Term (Current Q) Backlog in Default project Nov 30, 2023

richtja self-assigned this Jan 8, 2024

richtja moved this from Short Term (Current Q) Backlog to In progress in Default project Jan 29, 2024

richtja mentioned this issue Jan 30, 2024

TaskStatusService connection lost fix #5860

Merged

richtja mentioned this issue Feb 2, 2024

Disconnected test execution #5862

Open

richtja closed this as completed Feb 2, 2024

github-project-automation bot moved this from In progress to Done 104 in Default project Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avocado-runner-* crashes when status server socket closes #5794

avocado-runner-* crashes when status server socket closes #5794

clebergnu commented Nov 9, 2023

richtja commented Feb 2, 2024

avocado-runner-* crashes when status server socket closes #5794

avocado-runner-* crashes when status server socket closes #5794

Comments

clebergnu commented Nov 9, 2023

richtja commented Feb 2, 2024