Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SIGTERM received by server gracefully #7948

Merged
merged 60 commits into from
Mar 1, 2023
Merged

Conversation

ddelange
Copy link
Contributor

@ddelange ddelange commented Dec 20, 2022

Closes #7938 by adding a SIGTERM signal handler

Example

Checklist

  • This pull request references any related issue by including "closes <link to issue>"
    • If no issue exists and your change is not a small fix, please create an issue first.
  • This pull request includes tests or only affects documentation.
  • This pull request includes a label categorizing the change e.g. fix, feature, enhancement

@ddelange ddelange requested a review from zanieb as a code owner December 20, 2022 06:34
@netlify
Copy link

netlify bot commented Dec 20, 2022

Deploy Preview for prefect-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 8763521
🔍 Latest deploy log https://app.netlify.com/sites/prefect-docs/deploys/63ff91d5383e0b0007f932a3
😎 Deploy Preview https://deploy-preview-7948--prefect-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ddelange
Copy link
Contributor Author

@madkinsz how could I add a test Call SIGTERM handler expect a SIGTERM? I guess a new test block is needed, but what to change? It's not obvious to me how to choose which signal is sent, there's just the assertion 🤔

https://github.com/PrefectHQ/prefect/blob/2.7.3/tests/utilities/test_processutils.py#L136-L139

@serinamarie
Copy link
Contributor

Hi @ddelange, with the change that you made, did you see a difference in shutdown behavior? I tried this locally and when I SIGTERM the Uvicorn process I don't see a change in effect.

@ddelange
Copy link
Contributor Author

No I have not tested it actually, just a guess and question from the mobile website 😅

@serinamarie
Copy link
Contributor

serinamarie commented Dec 23, 2022

Hi @ddelange, no worries, as it turns out, sending a SIGTERM to the prefect process with your change worked as expected! However, we probably don't want forwarding of SIGTERM inside kill_on_interrupt, but instead to forward a SIGTERM directly. This in turn would mean creating a new test file inside of tests/cli/ called test_orion_start.py 🙂 instead of in test_processutils.py.

@ddelange
Copy link
Contributor Author

Any help with such a test would be highly appreciated, from the code comments in the current test: I'm not sure how to properly test the 'end to end' variation with pytest 🤔

@@ -337,3 +337,4 @@ def kill_process(*args):
os.kill(pid, signal.SIGKILL)

signal.signal(signal.SIGINT, stop_process)
signal.signal(signal.SIGTERM, stop_process)
Copy link
Contributor

@zanieb zanieb Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want a new utility called forward_signal(signum: int, pid: int) that'd look roughly like

previous_signal_handler = ...
def forward_signal
   os.kill(pid, signum)
   previous_signal_handler(..,)
   
signal.signal(sugnum, forward_signal)

You'll want to move call this from the prefect orion start command

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test the change to the CLI, you can run prefect orion start in a subprocess (using popen or similar) then send a SIGTERM to it. It should then exit gracefully.

Copy link
Contributor Author

@ddelange ddelange Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing sounds good.

as for your utility suggestion, i think im stumbling over something. probably insufficient understanding of signals :)

do you mean a generalisation of the utility? sigterm calls signint calls sigkill? so recursion?

i think i dont understand previous_signal_handler 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something you'd like to implement, and could tag me on so i can learn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and by testing "exit gracefully", do you mean to poll the subprocess exit code and assert it is 1 within a second or two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do i understand correctly from the comment in the code that you tried builtin anyio signal handling, but it didn't work nicely together with click?

Copy link
Contributor Author

@ddelange ddelange Dec 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi 👋 I took the time to put together the code and tests as I think you intended. The tests do a wait for prefect orion start subprocess to spin up a uvicorn worker, send two signals shortly after each other, and assert by the process output that the correct signals were received:

[
    b'\n',
    b' ___ ___ ___ ___ ___ ___ _____    ___  ___ ___ ___  _  _\n',
    b'| _ \\ _ \\ __| __| __/ __|_   _|  / _ \\| _ \\_ _/ _ \\| \\| |\n',
    b'|  _/   / _|| _|| _| (__  | |   | (_) |   /| | (_) | .` |\n',
    b'|_| |_|_\\___|_| |___\\___| |_|    \\___/|_|_\\___\\___/|_|\\_|\n',
    b'\n',
    b'Configure Prefect to communicate with the server with:\n',
    b'\n',
    b'    prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api\n',
    b'\n',
    b'View the API reference documentation at http://0.0.0.0:4200/docs\n',
    b'\n',
    b"The dashboard is not built. It looks like you're on a development version.\n",
    b'See `prefect dev` for development commands.\n',
    b'\n',
    b'\n',
    b'\n',
    b'11:15:44.690 | INFO    | uvicorn.error - Started server process [89342]\n',
    b'11:15:44.693 | INFO    | uvicorn.error - Waiting for application startup.\n',
    b'11:15:44.993 | INFO    | prefect.orion - Scheduler service scheduled to start in-app\n',
    b'11:15:44.993 | INFO    | prefect.orion - RecentDeploymentsScheduler service scheduled to start in-app\n',
    b'11:15:44.994 | INFO    | prefect.orion - MarkLateRuns service scheduled to start in-app\n',
    b'11:15:44.994 | INFO    | prefect.orion - FailExpiredPauses service scheduled to start in-app\n',
    b'11:15:44.995 | INFO    | prefect.orion - Telemetry service scheduled to start in-app\n',
    b'11:15:44.995 | INFO    | prefect.orion - FlowRunNotifications service scheduled to start in-app\n',
    b'11:15:45.013 | INFO    | uvicorn.error - Application startup complete.\n',
    b'11:15:45.017 | INFO    | uvicorn.error - Uvicorn running on http://0.0.0.0:4200 (Press CTRL+C to quit)\n',
    b'\n',
    b'Sending SIGTERM to Orion (PID 89342)...\n',
    b'11:15:45.107 | INFO    | prefect.orion.services.recentdeploymentsscheduler - Scheduled 0 runs.\n',
    b'\n',
    b'Sending SIGKILL to Orion (PID 89342)...\n',
    b'Orion stopped!\n'
]

Copy link
Contributor Author

@ddelange ddelange Dec 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added another test for the graceful shutdown. it looks really graceful 😄

 ___ ___ ___ ___ ___ ___ _____    ___  ___ ___ ___  _  _
| _ \ _ \ __| __| __/ __|_   _|  / _ \| _ \_ _/ _ \| \| |
|  _/   / _|| _|| _| (__  | |   | (_) |   /| | (_) | .` |
|_| |_|_\___|_| |___\___| |_|    \___/|_|_\___\___/|_|\_|

Configure Prefect to communicate with the server with:

    prefect config set PREFECT_API_URL=http://0.0.0.0:4200/api

View the API reference documentation at http://0.0.0.0:4200/docs

The dashboard is not built. It looks like you're on a development version.
See `prefect dev` for development commands.



11:38:26.690 | INFO    | uvicorn.error - Started server process [90293]
11:38:26.693 | INFO    | uvicorn.error - Waiting for application startup.
11:38:27.006 | INFO    | prefect.orion - Scheduler service scheduled to start in-app
11:38:27.007 | INFO    | prefect.orion - RecentDeploymentsScheduler service scheduled to start in-app
11:38:27.007 | INFO    | prefect.orion - MarkLateRuns service scheduled to start in-app
11:38:27.008 | INFO    | prefect.orion - FailExpiredPauses service scheduled to start in-app
11:38:27.008 | INFO    | prefect.orion - Telemetry service scheduled to start in-app
11:38:27.009 | INFO    | prefect.orion - FlowRunNotifications service scheduled to start in-app
11:38:27.029 | INFO    | uvicorn.error - Application startup complete.
11:38:27.031 | INFO    | uvicorn.error - Uvicorn running on http://0.0.0.0:4200 (Press CTRL+C to quit)

Sending SIGTERM to Orion (PID 90293)...
11:38:27.147 | INFO    | uvicorn.error - Shutting down
11:38:27.152 | INFO    | prefect.orion.services.recentdeploymentsscheduler - Scheduled 0 runs.
11:38:27.155 | INFO    | prefect.orion.services.scheduler - Scheduled 0 runs.
11:38:27.156 | INFO    | prefect.orion.services.failexpiredpauses - Finished monitoring for late runs.
11:38:27.158 | INFO    | prefect.orion.services.marklateruns - Finished monitoring for late runs.
11:38:27.161 | INFO    | prefect.orion - FlowRunNotifications service stopped!
11:38:27.250 | INFO    | uvicorn.error - Waiting for application shutdown.
11:38:27.781 | INFO    | prefect.orion - Telemetry service stopped!
11:38:28.159 | INFO    | prefect.orion - RecentDeploymentsScheduler service stopped!
11:38:28.160 | INFO    | prefect.orion - Scheduler service stopped!
11:38:28.161 | INFO    | prefect.orion - FailExpiredPauses service stopped!
11:38:28.162 | INFO    | prefect.orion - MarkLateRuns service stopped!
11:38:28.175 | INFO    | uvicorn.error - Application shutdown complete.
11:38:28.176 | INFO    | uvicorn.error - Finished server process [90293]
Orion stopped!

@ddelange ddelange requested a review from zanieb January 4, 2023 10:48
@ddelange
Copy link
Contributor Author

ddelange commented Jan 4, 2023

@serinamarie @madkinsz ready for review 👍

Copy link
Contributor

@zanieb zanieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zanieb zanieb added the enhancement An improvement of an existing feature label Jan 6, 2023
@ddelange
Copy link
Contributor Author

ddelange commented Jan 6, 2023

did I pull a flakey test?

@zanieb
Copy link
Contributor

zanieb commented Jan 6, 2023

Looks like tests/cli/test_start_orion.py::TestUvicornSignalForwarding::test_sigterm_sends_sigterm_directly - subprocess.TimeoutExpired: Command '['prefect', 'orion', 'start', '--host', '0.0.0.0', '--port', '4200', '--log-level', 'INFO']' timed out after 2 seconds is failing/hanging.

@ddelange
Copy link
Contributor Author

ddelange commented Jan 7, 2023

thanks, somehow missed that. bumped the timeout, which is more of a failsafe anyway

@ddelange
Copy link
Contributor Author

ddelange commented Jan 7, 2023

ran that particular test again on my machine, from the point of sending SIGTERM it needs a little over a second to reach Orion stopped!

@zanieb zanieb changed the title Handle SIGTERM received by orion-server gracefully Handle SIGTERM received by server gracefully Feb 22, 2023
@zanieb
Copy link
Contributor

zanieb commented Feb 22, 2023

@ddelange thanks for updating this with main! I appreciate that you're still on top of it. @bunchesofdonald would you have time to review this?

@ddelange
Copy link
Contributor Author

wohoo we're green 💥

zanieb and others added 6 commits February 24, 2023 10:15
* 'main' of https://github.com/prefecthq/prefect:
  Bump @playwright/test from 1.30.0 to 1.31.1 in /ui (PrefectHQ#8663)
  Bump @prefecthq/prefect-ui-library from 1.1.9 to 1.1.10 in /ui (PrefectHQ#8661)
  Bump vite from 4.1.2 to 4.1.4 in /ui (PrefectHQ#8662)
  Bump eslint from 8.34.0 to 8.35.0 in /ui (PrefectHQ#8664)
  Fix of a typo - removed repeated word (PrefectHQ#8654)
  Enable DefaultAzureCredential authentication for Azure filesystem block (PrefectHQ#7513)
  Improve supervisor repr for debugging (PrefectHQ#8633)
@ddelange
Copy link
Contributor Author

ddelange commented Mar 1, 2023

@madkinsz can it be that these new tests are making existing tests flaky?

@zanieb
Copy link
Contributor

zanieb commented Mar 1, 2023

@ddelange I'm not sure, I took on #8680 / #8679 / #8681 in an attempt to fix some flakes I saw here.

@zanieb zanieb merged commit c3dd617 into PrefectHQ:main Mar 1, 2023
@zanieb
Copy link
Contributor

zanieb commented Mar 1, 2023

Thanks for your patience!

ddelange added a commit to ddelange/prefect that referenced this pull request Mar 2, 2023
…eful-agent

* 'main' of https://github.com/prefecthq/prefect:
  BugFix: Add information about task inputs (PrefectHQ#8295)
  Handle SIGTERM received by server gracefully (PrefectHQ#7948)
@br3ndonland br3ndonland mentioned this pull request Jun 4, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Orion server ignores SIGTERM termination signal from docker/k8s
3 participants