-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avocado-runner-* crashes when status server socket closes #5794
Comments
richtja
moved this from Short Term (Current Q) Backlog
to In progress
in Default project
Jan 29, 2024
richtja
added a commit
to richtja/avocado
that referenced
this issue
Jan 30, 2024
This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]>
richtja
added a commit
to richtja/avocado
that referenced
this issue
Jan 30, 2024
This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]>
richtja
added a commit
to richtja/avocado
that referenced
this issue
Jan 30, 2024
This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]>
richtja
added a commit
to richtja/avocado
that referenced
this issue
Jan 31, 2024
This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]> Signed-off-by: Cleber Rosa <[email protected]>
Solved in #5860 |
github-project-automation
bot
moved this from In progress
to Done 104
in Default project
Feb 2, 2024
clebergnu
pushed a commit
to clebergnu/avocado
that referenced
this issue
Jun 11, 2024
This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]> Signed-off-by: Cleber Rosa <[email protected]>
clebergnu
pushed a commit
to clebergnu/avocado
that referenced
this issue
Jun 11, 2024
This commit adds error handling to TaskStatusService. When the connection is lost, it will try to establish a new connection. If the connection is not possible to renew, the task will send warning message about new status and remove TaskStatusService from available services. Reference: avocado-framework#5794 Signed-off-by: Jan Richter <[email protected]> Signed-off-by: Cleber Rosa <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
All runners built on top of
avocado.core.nrunner.app.BaseRunnerApp
will useTaskStatusService.post()
to send status messages to the status server. If the socket get closed, there will be anEPIPE
returned duringsend()
, followed by aSIGPIPE
, followed by aBrokenPipeError
that is not handled, which results in a crash.Steps to reproduce
nc -l -U /tmp/socket &
avocado-runner-rogue task-run -i ID -u x-avocado-runner-rogue -s /tmp/socket &
kill %1
Expected behavior
The runners should be more resilient when the status server is not operating properly. Ignoring failures is not a good option, so maybe runners will need a way to queue and persist messages they can't initially send to status servers.
But known crashes should be prevented, and unknown crashes (with information such as backtraces) should be captured.
Current behavior
System information (please complete the following information):
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: Fedora Description: Fedora release 38 (Thirty Eight) Release: 38 Codename: ThirtyEight
Additional information
This was found under #5788
The text was updated successfully, but these errors were encountered: