-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(robot-server): Work around tests hanging because server shutdown is hanging #16317
Conversation
It's interesting that cherry-picking #16171 didn't fix the problem which means this might be different issue. |
It's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thank you for fixing this!
I'm going to merge this now because:
I agree that we should understand what's actually going on here, and we can continue investigation in EXEC-716. |
Sounds good! Thanks for investigating this! |
Just remembered that @y3rsh had a PR up to force kill uvicorn processes if a regular shutdown failed. Do you think it's worth adding that as well or does it not matter since the CI flow will be terminated after test cancellation anyway? It might be a good option if we just want the tests to pass right now and the 'wait for analyses some more' workaround is taking a really long time. |
Ooh. It's reasonable to auto-kill the process after a timeout, but I think it should count as a test failure, not as a success. Basically, it'd turn the GitHub-Action-level timeout into a test-specific timeout. I wouldn't want the auto-kill to accidentally cover up other shutdown hang bugs, basically. |
Overview
This tries to work around robot-server tests hanging on
edge
's CI checks.Test Plan and Hands on Testing
See if CI passes now.
Details
It seems like server shutdown hangs when there is an ongoing analysis. I think this is the same problem that @sanni-t identified in #16171. Unfortunately, simply cherry-picking #16171 into
edge
does not solve the problem.The last lines from server shutdown:
I think resolving this properly will involve merging
chore_release-8.0.0
intoedge
and then rethinking howProtocolAnalyzer
+RunOrchestrator
+ProtocolEngine
get cleaned up.The workaround in this PR is to make the tests wait for all analyses to complete before restarting the server.
Review requests
Does this workaround make sense for now?
Risk assessment
No risk. Changes are just to tests.