-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle SIGTERM received by server gracefully #7948
Conversation
✅ Deploy Preview for prefect-docs ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site settings. |
@madkinsz how could I add a test |
Hi @ddelange, with the change that you made, did you see a difference in shutdown behavior? I tried this locally and when I SIGTERM the Uvicorn process I don't see a change in effect. |
No I have not tested it actually, just a guess and question from the mobile website 😅 |
Hi @ddelange, no worries, as it turns out, sending a SIGTERM to the prefect process with your change worked as expected! However, we probably don't want forwarding of SIGTERM inside |
Any help with such a test would be highly appreciated, from the code comments in the current test: I'm not sure how to properly test the 'end to end' variation with pytest 🤔 |
@@ -337,3 +337,4 @@ def kill_process(*args): | |||
os.kill(pid, signal.SIGKILL) | |||
|
|||
signal.signal(signal.SIGINT, stop_process) | |||
signal.signal(signal.SIGTERM, stop_process) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want a new utility called forward_signal(signum: int, pid: int)
that'd look roughly like
previous_signal_handler = ...
def forward_signal
os.kill(pid, signum)
previous_signal_handler(..,)
signal.signal(sugnum, forward_signal)
You'll want to move call this from the prefect orion start
command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test the change to the CLI, you can run prefect orion start
in a subprocess (using popen
or similar) then send a SIGTERM to it. It should then exit gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing sounds good.
as for your utility suggestion, i think im stumbling over something. probably insufficient understanding of signals :)
do you mean a generalisation of the utility? sigterm calls signint calls sigkill? so recursion?
i think i dont understand previous_signal_handler
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something you'd like to implement, and could tag me on so i can learn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and by testing "exit gracefully", do you mean to poll the subprocess exit code and assert it is 1 within a second or two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do i understand correctly from the comment in the code that you tried builtin anyio signal handling, but it didn't work nicely together with click?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi 👋 I took the time to put together the code and tests as I think you intended. The tests do a wait for prefect orion start
subprocess to spin up a uvicorn worker, send two signals shortly after each other, and assert by the process output that the correct signals were received:
[
b'\n',
b' ___ ___ ___ ___ ___ ___ _____ ___ ___ ___ ___ _ _\n',
b'| _ \\ _ \\ __| __| __/ __|_ _| / _ \\| _ \\_ _/ _ \\| \\| |\n',
b'| _/ / _|| _|| _| (__ | | | (_) | /| | (_) | .` |\n',
b'|_| |_|_\\___|_| |___\\___| |_| \\___/|_|_\\___\\___/|_|\\_|\n',
b'\n',
b'Configure Prefect to communicate with the server with:\n',
b'\n',
b' prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api\n',
b'\n',
b'View the API reference documentation at http://0.0.0.0:4200/docs\n',
b'\n',
b"The dashboard is not built. It looks like you're on a development version.\n",
b'See `prefect dev` for development commands.\n',
b'\n',
b'\n',
b'\n',
b'11:15:44.690 | INFO | uvicorn.error - Started server process [89342]\n',
b'11:15:44.693 | INFO | uvicorn.error - Waiting for application startup.\n',
b'11:15:44.993 | INFO | prefect.orion - Scheduler service scheduled to start in-app\n',
b'11:15:44.993 | INFO | prefect.orion - RecentDeploymentsScheduler service scheduled to start in-app\n',
b'11:15:44.994 | INFO | prefect.orion - MarkLateRuns service scheduled to start in-app\n',
b'11:15:44.994 | INFO | prefect.orion - FailExpiredPauses service scheduled to start in-app\n',
b'11:15:44.995 | INFO | prefect.orion - Telemetry service scheduled to start in-app\n',
b'11:15:44.995 | INFO | prefect.orion - FlowRunNotifications service scheduled to start in-app\n',
b'11:15:45.013 | INFO | uvicorn.error - Application startup complete.\n',
b'11:15:45.017 | INFO | uvicorn.error - Uvicorn running on http://0.0.0.0:4200 (Press CTRL+C to quit)\n',
b'\n',
b'Sending SIGTERM to Orion (PID 89342)...\n',
b'11:15:45.107 | INFO | prefect.orion.services.recentdeploymentsscheduler - Scheduled 0 runs.\n',
b'\n',
b'Sending SIGKILL to Orion (PID 89342)...\n',
b'Orion stopped!\n'
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added another test for the graceful shutdown. it looks really graceful 😄
___ ___ ___ ___ ___ ___ _____ ___ ___ ___ ___ _ _
| _ \ _ \ __| __| __/ __|_ _| / _ \| _ \_ _/ _ \| \| |
| _/ / _|| _|| _| (__ | | | (_) | /| | (_) | .` |
|_| |_|_\___|_| |___\___| |_| \___/|_|_\___\___/|_|\_|
Configure Prefect to communicate with the server with:
prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api
View the API reference documentation at http://0.0.0.0:4200/docs
The dashboard is not built. It looks like you're on a development version.
See `prefect dev` for development commands.
11:38:26.690 | INFO | uvicorn.error - Started server process [90293]
11:38:26.693 | INFO | uvicorn.error - Waiting for application startup.
11:38:27.006 | INFO | prefect.orion - Scheduler service scheduled to start in-app
11:38:27.007 | INFO | prefect.orion - RecentDeploymentsScheduler service scheduled to start in-app
11:38:27.007 | INFO | prefect.orion - MarkLateRuns service scheduled to start in-app
11:38:27.008 | INFO | prefect.orion - FailExpiredPauses service scheduled to start in-app
11:38:27.008 | INFO | prefect.orion - Telemetry service scheduled to start in-app
11:38:27.009 | INFO | prefect.orion - FlowRunNotifications service scheduled to start in-app
11:38:27.029 | INFO | uvicorn.error - Application startup complete.
11:38:27.031 | INFO | uvicorn.error - Uvicorn running on http://0.0.0.0:4200 (Press CTRL+C to quit)
Sending SIGTERM to Orion (PID 90293)...
11:38:27.147 | INFO | uvicorn.error - Shutting down
11:38:27.152 | INFO | prefect.orion.services.recentdeploymentsscheduler - Scheduled 0 runs.
11:38:27.155 | INFO | prefect.orion.services.scheduler - Scheduled 0 runs.
11:38:27.156 | INFO | prefect.orion.services.failexpiredpauses - Finished monitoring for late runs.
11:38:27.158 | INFO | prefect.orion.services.marklateruns - Finished monitoring for late runs.
11:38:27.161 | INFO | prefect.orion - FlowRunNotifications service stopped!
11:38:27.250 | INFO | uvicorn.error - Waiting for application shutdown.
11:38:27.781 | INFO | prefect.orion - Telemetry service stopped!
11:38:28.159 | INFO | prefect.orion - RecentDeploymentsScheduler service stopped!
11:38:28.160 | INFO | prefect.orion - Scheduler service stopped!
11:38:28.161 | INFO | prefect.orion - FailExpiredPauses service stopped!
11:38:28.162 | INFO | prefect.orion - MarkLateRuns service stopped!
11:38:28.175 | INFO | uvicorn.error - Application shutdown complete.
11:38:28.176 | INFO | uvicorn.error - Finished server process [90293]
Orion stopped!
Signed-off-by: ddelange <[email protected]>
Signed-off-by: ddelange <[email protected]>
Signed-off-by: ddelange <[email protected]>
@serinamarie @madkinsz ready for review 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
did I pull a flakey test? |
Looks like |
thanks, somehow missed that. bumped the timeout, which is more of a failsafe anyway |
ran that particular test again on my machine, from the point of sending SIGTERM it needs a little over a second to reach |
@ddelange thanks for updating this with |
wohoo we're green 💥 |
* 'main' of https://github.com/prefecthq/prefect: Bump @playwright/test from 1.30.0 to 1.31.1 in /ui (PrefectHQ#8663) Bump @prefecthq/prefect-ui-library from 1.1.9 to 1.1.10 in /ui (PrefectHQ#8661) Bump vite from 4.1.2 to 4.1.4 in /ui (PrefectHQ#8662) Bump eslint from 8.34.0 to 8.35.0 in /ui (PrefectHQ#8664) Fix of a typo - removed repeated word (PrefectHQ#8654) Enable DefaultAzureCredential authentication for Azure filesystem block (PrefectHQ#7513) Improve supervisor repr for debugging (PrefectHQ#8633)
@madkinsz can it be that these new tests are making existing tests flaky? |
Thanks for your patience! |
…eful-agent * 'main' of https://github.com/prefecthq/prefect: BugFix: Add information about task inputs (PrefectHQ#8295) Handle SIGTERM received by server gracefully (PrefectHQ#7948)
Closes #7938 by adding a SIGTERM signal handler
Example
Checklist
<link to issue>
"fix
,feature
,enhancement