Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master process should restart expired workers. #517

Closed
gnat opened this issue Dec 10, 2019 · 17 comments
Closed

Master process should restart expired workers. #517

gnat opened this issue Dec 10, 2019 · 17 comments

Comments

@gnat
Copy link

gnat commented Dec 10, 2019

Uvicorn stays alive when all workers are dead. Is this intended behaviour? Could someone explain the benefit of this?

Uvicorn does not seem to restart individual workers if they are dead. Therefore, shouldn't Uvicorn itself exit when there are zero workers left?

A container orchestrator could automatically reload Uvicorn on exit, which would be awesome.

Important

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@tomchristie
Copy link
Member

Uvicorn does not seem to restart individual workers if they are dead. Therefore, shouldn't Uvicorn itself exit when there are zero workers left?

Really what we'd like to do here is have the master process restart the child processes.

@tomchristie tomchristie changed the title Uvicorn stays alive when all workers are dead. Master process should restart expired workers. Dec 10, 2019
@gnat
Copy link
Author

gnat commented Dec 10, 2019

Either sounds good!

Also would Uvicorn restarting workers obsolete the usage of Gunicorn + Uvicorn workers? Or is Gunicorn also providing additional value that I'm not aware of?

@tomchristie
Copy link
Member

@gnat For many users, yes.

I think we'd probably want to spin the GunicornWorker out into a seperatly managed package at that point, and in general just recommend running uvicorn directly.

Gunicorn gives you some more advanced options wrt. signal handling and restarts, but most users probably don't actually need that.

@sposs
Copy link

sposs commented Jun 22, 2020

Hi,
For me the status of this issue is problematic: on the one hand you tell users in the doc to use Uvicorn directly (which I did), and advertise the option --limit-max-requests (which I used for safety), but on the other hand, the master process does not restart the stopped processes, and it's recommended here to actually use Gunicorn. Which is it? Shouldn't the doc be changed to mention that the --limit-max-requests does not actually restart the stopped processes? In which case I believe a little more detail is needed to be able to manually restart them... Or, maybe the doc could indicate that this option should not be used and Gunicorn be used instead to provide the same functionality?

@sposs
Copy link

sposs commented Jun 22, 2020

Actually, I should specify that I used supervisor to run the master process, with the following conf

[program:checklist_server]
command=uvicorn --workers 4 backend.asgi:application --limit-max-requests 1000
user=service
autostart=true
autorestart=true
redirect_stderr=true
stopasgroup=true
killasgroup=true

and if the child processes are stopped, as the master still runs, the service is not restarted...

@diwu1989
Copy link

diwu1989 commented Oct 2, 2020

The current behavior is very non-obvious and not what gunicorn users expect.
Please change this to auto-restart.

@cb109
Copy link

cb109 commented Apr 22, 2021

This behaviour is the opposite of what I would expect to happen. The --limit-max-requests mechanic seems helpful to avoid workers going stale e.g. with memory leaks or such. But what good is a uvicorn parent process with all worker processes dead and never restarted? This seems to be a misconception that should be fixed to restart workers automatically after they hit the request limit.

If that's not feasible or out of scope I'd vote to remove the flag altogether honestly and document how to achieve the same with running under gunicorn.

@Kludex
Copy link
Member

Kludex commented Sep 28, 2021

I'll be working on this.

@Kludex Kludex added this to the Version 1.0 milestone Sep 28, 2021
@Kludex
Copy link
Member

Kludex commented May 15, 2022

I have worked on this. The PR was ready (#1205), but there was not enough motivation to review, and I lost the strength to continue with that PR.

Gunicorn is the way to go here.

@Kludex Kludex closed this as completed May 15, 2022
@gnat
Copy link
Author

gnat commented May 15, 2022

With all due respect @Kludex you're awesome and we highly appreciate your effort on this, but this is not your ticket to close.

If you cannot continue your effort, this issue should go back to unassigned and/or be taken out of the 1.0 milestone, but far too many uvicorn users are encountering this and it still must be documented as an issue.

@tomchristie thoughts?

@Kludex
Copy link
Member

Kludex commented May 16, 2022

We already document that gunicorn should be used in production if using multiple workers.

PR is welcome to make it clear that the master process don't restart the dead workers.

When I say a lost strength is about trying to convince others, not about effort...

@Kludex Kludex removed this from the Version 1.0 milestone May 16, 2022
@gnat
Copy link
Author

gnat commented May 16, 2022

Sorry to hear that.

We already document that gunicorn should be used in production if using multiple workers.

No offense intended, but based on how this was originally triaged, it sounds like uvicorn wasn't supposed to remain a single-worker test server into 1.0.

Before the solution was transformed into a full process manager, many folks would be quite happy with:

  1. Workers are dead? (Or % of)
  2. Gracefully exit Uvicorn so it can be restarted.

This is dead simple and would address the immediate issue. This would be acceptable to anyone running Uvicorn under:

  • systemd
  • docker
  • bash script that auto restarts uvicorn

This is the 90% solution, would be KISS, and we can revisit a full blown process manager in the future. We could go into 1.0 with confidence.

@Kludex
Copy link
Member

Kludex commented May 16, 2022

How is that "dead simple", and how it cannot be solved using gunicorn? Single-worker test server?

You keep mentioning 1.0, but I was the one who added this issue to that milestone... There is no one else supporting this to achieve that milestone.

I'll even go further...

The way I see it, we have three possible paths:

  • deprecate workers feature, and motivate the gunicorn usage.
  • further develop workers feature (this issue).
  • don't do anything, document better.

As we discussed, cc @euri10 ..

@mrob95
Copy link

mrob95 commented Feb 25, 2024

Hi, I just hit this in production.

Due to a misconfigured caching policy our uvicorn workers were hitting memory limits and getting killed, then never recovering. Once the workers that were started on start-up ran out, the app hung indefinitely.

An easy repro for this is to put the following into a main.py:

# pip install fastapi uvicorn

from fastapi import FastAPI
import os
import signal

app = FastAPI()

@app.get("/")
def die():
    os.kill(os.getpid(), signal.SIGKILL)

Then run:

uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

When the server gets hit, one of the workers dies. Once we run out of workers the server hangs:

while true; do echo "Sending request..."; curl http://localhost:8000; done
Sending request...
curl: (52) Empty reply from server
Sending request...
curl: (52) Empty reply from server
Sending request...
curl: (52) Empty reply from server
Sending request...
curl: (52) Empty reply from server
Sending request...
... (hangs indefinitely)

There are no indications in the server logs that anything went wrong.

I have fixed this problem by using gunicorn as a process manager as recommended here:

https://fastapi.tiangolo.com/deployment/server-workers/#gunicorn-with-uvicorn-workers

I'm surprised that a bug like this would be left in a web server implementation for 4+ years. Hanging without servicing requests or exiting is pretty much the worst thing a server can do.

If a robust fix is complicated to implement, I think it would be good to at least emphasise in the start-up logs (as the flask debug server does) that uvicorn alone is not intended for production usage.

@gnat
Copy link
Author

gnat commented Feb 25, 2024

@mrob95 the lines you need in your Uvicorn to do the job: #1942

I'm surprised that a bug like this would be left in a web server implementation for 4+ years. Hanging without servicing requests or exiting is pretty much the worst thing a server can do.

Yup, it's bad.

@silverjam
Copy link

[..]

  • deprecate workers feature, and motivate the gunicorn usage.
    [..]
  • don't do anything, document better.

FWIW we got led down this path too by a desire to use the --limit-concurrency feature to make sure we kept memory usage under control in our service and eventually ran into the issue with workers hanging mentioned by @mrob95 because we were also using --limit-max-requests.

I would recommend the following path forward to mitigate the issue:

  • Deprecate --workers and warn (in code and docs) that --limit-max-requests should not be used in combination with it (or explicitly disallow the combination of --limit-max-requests and --workers).
  • Document more explicitly a recommended approach for accessing Uvicorn specific options via Gunicorn (such as the concurrency limits) -- e.g. something like this StackOverflow post or whatever is more appropriate/idiomatic.

@Kludex
Copy link
Member

Kludex commented May 28, 2024

This will be available in 0.30.0.

@Kludex Kludex closed this as completed May 28, 2024
br3ndonland added a commit to br3ndonland/inboard that referenced this issue Dec 22, 2024
This project supports the Gunicorn web server. Gunicorn's server design
includes a primary "arbiter" process that spawns "worker" processes.
Workers are child processes, each with their own running server. Workers
are implemented as Python classes. Custom workers can be supplied.

This project also supports the Uvicorn web server. In the past, Uvicorn
supplied workers for use with Gunicorn. The Uvicorn workers were not
tested. The `uvicorn.workers` module was completely omitted from
coverage measurement due to use of the coverage.py `include` setting
to specify source files.

Efforts were made to test the Uvicorn workers
(encode/uvicorn#1834,
encode/uvicorn#1995),
but the workers were arbitrarily deprecated and moved to
someone's personal project (encode/uvicorn#2302),
instead of an Encode-managed project as would have been expected
(encode/uvicorn#517 (comment))

Rather than introducing a production dependency on a separate Uvicorn
workers package that is not managed by Encode, this commit will add the
Gunicorn workers directly to this project.

This commit will add the code from `uvicorn.workers` to a new module
`inboard/gunicorn_workers.py`. The code will be preserved as it was
prior to deprecation, with a copy of the Uvicorn license and necessary
updates for compliance with the code quality settings in this project:

- Ruff
  [UP008](https://docs.astral.sh/ruff/rules/super-call-with-parameters/)
- Ruff
  [UP035](https://docs.astral.sh/ruff/rules/deprecated-import/)
- mypy `[attr-defined]` - Module "uvicorn.main" does not explicitly
  export attribute "Server"
- mypy `[import-untyped]` - `gunicorn.arbiter`
- mypy `[import-untyped]` - `gunicorn.workers.base`
- mypy `[misc]` - Class cannot subclass "Worker" (has type "Any")
- mypy `[type-arg]` - Missing type parameters for generic type "dict"
  (on `config_kwargs`)

This commit will also add tests of 100% of the Gunicorn worker code to a
new module `tests/test_gunicorn_workers.py`.

A test fixture starts a subprocess running Gunicorn with a Uvicorn worker
and an ASGI app. The subprocess includes an instance of `httpx.Client`
for HTTP requests to the Uvicorn worker's ASGI app, and saves its output
to a temporary file for assertions on `stdout`/`stderr`. Tests can send
operating system signals to the process.

The coverage.py configuration will be updated for subprocess test
coverage measurement. Changes to coverage measurement include:

- Enable the required parallel mode (note that it is important to ensure
  the `.gitignore` ignores files named `.coverage.*` because many coverage
  files are generated when subprocesses are measured in parallel mode)
- Set the required `COVERAGE_PROCESS_START` environment variable
- Add the `coverage_enable_subprocess` package to invoke
  `coverage.process_startup`
- Combine coverage reports before reporting coverage
- Add instructions to `contributing.md` about how to omit subprocess
  tests

Related:

https://github.com/encode/uvicorn/blob/4fd507718eb0313e2de66123e6737b054088f722/LICENSE.md
https://github.com/encode/uvicorn/blob/4fd507718eb0313e2de66123e6737b054088f722/uvicorn/workers.py
encode/uvicorn#517 (comment)
encode/uvicorn#1834
encode/uvicorn#1995
encode/uvicorn#2302

https://coverage.readthedocs.io/en/latest/subprocess.html
https://docs.gunicorn.org/en/latest/design.html
https://docs.gunicorn.org/en/latest/signals.html
https://www.uvicorn.org/deployment/#gunicorn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants