Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avocado-runner-* crashes when status server socket closes #5794

Closed
clebergnu opened this issue Nov 9, 2023 · 1 comment
Closed

avocado-runner-* crashes when status server socket closes #5794

clebergnu opened this issue Nov 9, 2023 · 1 comment
Assignees
Labels

Comments

@clebergnu
Copy link
Contributor

Describe the bug
All runners built on top of avocado.core.nrunner.app.BaseRunnerApp will use TaskStatusService.post() to send status messages to the status server. If the socket get closed, there will be an EPIPE returned during send(), followed by a SIGPIPE, followed by a BrokenPipeError that is not handled, which results in a crash.

Steps to reproduce

  1. nc -l -U /tmp/socket &
  2. avocado-runner-rogue task-run -i ID -u x-avocado-runner-rogue -s /tmp/socket &
  3. kill %1

Expected behavior
The runners should be more resilient when the status server is not operating properly. Ignoring failures is not a good option, so maybe runners will need a way to queue and persist messages they can't initially send to status servers.

But known crashes should be prevented, and unknown crashes (with information such as backtraces) should be captured.

Current behavior

avocado-runner-rogue task-run -i ID -u x-avocado-runner-rogue -s /tmp/socket &
[1] 747815
{'status': 'started', 'time': 142105.739663621, 'output_dir': '/tmp/.avocado-task-djf9hljx', 'id': 'ID'}
{'status': 'running', 'time': 142105.739762377, 'id': 'ID'}
...
{'status': 'running', 'time': 142113.250716832, 'id': 'ID'}
Traceback (most recent call last):
  File "/home/cleber/.local/bin/avocado-runner-rogue", line 33, in <module>
    sys.exit(load_entry_point('avocado-rogue', 'console_scripts', 'avocado-runner-rogue')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cleber/src/avocado/avocado/examples/plugins/tests/rogue/avocado_rogue/runner.py", line 55, in main
    app.run()
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/app.py", line 179, in run
    return kallable(args)
           ^^^^^^^^^^^^^^
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/app.py", line 293, in command_task_run
    for status in task.run():
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/task.py", line 213, in run
    status_service.post(status)
  File "/home/cleber/src/avocado/avocado/avocado/core/nrunner/task.py", line 65, in post
    self.connection.send(data.encode("ascii") + "\n".encode("ascii"))
BrokenPipeError: [Errno 32] Broken pipe

System information (please complete the following information):

  • OS: LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: Fedora Description: Fedora release 38 (Thirty Eight) Release: 38 Codename: ThirtyEight
  • Avocado version: a038820
  • Avocado installation method: Git

Additional information
This was found under #5788

@clebergnu clebergnu added the bug label Nov 9, 2023
@clebergnu clebergnu added this to the #104 - Codename TBD milestone Nov 9, 2023
@mr-avocado mr-avocado bot moved this to Triage in Default project Nov 9, 2023
@richtja richtja moved this from Triage to Short Term (Current Q) Backlog in Default project Nov 30, 2023
@richtja richtja self-assigned this Jan 8, 2024
@richtja richtja moved this from Short Term (Current Q) Backlog to In progress in Default project Jan 29, 2024
richtja added a commit to richtja/avocado that referenced this issue Jan 30, 2024
This commit adds error handling to TaskStatusService. When the
connection is lost, it will try to establish a new connection. If the
connection is not possible to renew, the task will send warning message
about new status and remove TaskStatusService from available services.

Reference: avocado-framework#5794
Signed-off-by: Jan Richter <[email protected]>
richtja added a commit to richtja/avocado that referenced this issue Jan 30, 2024
This commit adds error handling to TaskStatusService. When the
connection is lost, it will try to establish a new connection. If the
connection is not possible to renew, the task will send warning message
about new status and remove TaskStatusService from available services.

Reference: avocado-framework#5794
Signed-off-by: Jan Richter <[email protected]>
richtja added a commit to richtja/avocado that referenced this issue Jan 30, 2024
This commit adds error handling to TaskStatusService. When the
connection is lost, it will try to establish a new connection. If the
connection is not possible to renew, the task will send warning message
about new status and remove TaskStatusService from available services.

Reference: avocado-framework#5794
Signed-off-by: Jan Richter <[email protected]>
richtja added a commit to richtja/avocado that referenced this issue Jan 31, 2024
This commit adds error handling to TaskStatusService. When the
connection is lost, it will try to establish a new connection. If the
connection is not possible to renew, the task will send warning message
about new status and remove TaskStatusService from available services.

Reference: avocado-framework#5794
Signed-off-by: Jan Richter <[email protected]>
Signed-off-by: Cleber Rosa <[email protected]>
@richtja
Copy link
Contributor

richtja commented Feb 2, 2024

Solved in #5860

@richtja richtja closed this as completed Feb 2, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done 104 in Default project Feb 2, 2024
clebergnu pushed a commit to clebergnu/avocado that referenced this issue Jun 11, 2024
This commit adds error handling to TaskStatusService. When the
connection is lost, it will try to establish a new connection. If the
connection is not possible to renew, the task will send warning message
about new status and remove TaskStatusService from available services.

Reference: avocado-framework#5794
Signed-off-by: Jan Richter <[email protected]>
Signed-off-by: Cleber Rosa <[email protected]>
clebergnu pushed a commit to clebergnu/avocado that referenced this issue Jun 11, 2024
This commit adds error handling to TaskStatusService. When the
connection is lost, it will try to establish a new connection. If the
connection is not possible to renew, the task will send warning message
about new status and remove TaskStatusService from available services.

Reference: avocado-framework#5794
Signed-off-by: Jan Richter <[email protected]>
Signed-off-by: Cleber Rosa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

2 participants