[BUG] Exit at startup failure on multiple workers #1115

Kludex · 2021-07-10T17:46:21Z

Checklist

The bug is reproducible against the latest release and/or master.
There are no similar issues or pull requests to fix it yet.

Describe the bug

When the startup fails, either on reload mode or multiple workers, the main process is not terminated.

To reproduce

Application:

# test.py
from fastapi import FastAPI

app = FastAPI()

@app.on_event("startup")
def startup():
    raise Exception("Hi")

uvicorn test:app --workers 2
# or
uvicorn test:app --reload

Expected behavior

All the processes (parent and children) should be terminated.

Actual behavior

❯ uvicorn test:app --reload
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [40026] using watchgod
INFO:     Started server process [40028]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 540, in lifespan
    async for item in self.lifespan_context(app):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 481, in default_lifespan
    await self.startup()
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 518, in startup
    handler()
  File "/home/marcelo/Development/./test.py", line 7, in startup
    raise Exception("Hi")
Exception: Hi

ERROR:    Application startup failed. Exiting.

After this log, it hangs forever. On the reload, it doesn't prevent the failure from happening multiple times if you modify a watched file. See:

❯ uvicorn test:app --reload            
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [40674] using watchgod
INFO:     Started server process [40676]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 540, in lifespan
    async for item in self.lifespan_context(app):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 481, in default_lifespan
    await self.startup()
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 518, in startup
    handler()
  File "/home/marcelo/Development/./test.py", line 7, in startup
    raise Exception("Hi")
Exception: Hi

ERROR:    Application startup failed. Exiting.
WARNING:  WatchGodReload detected file change in '['/home/marcelo/Development/test.py']'. Reloading...
INFO:     Started server process [40734]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 540, in lifespan
    async for item in self.lifespan_context(app):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 481, in default_lifespan
    await self.startup()
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 518, in startup
    handler()
  File "/home/marcelo/Development/./test.py", line 7, in startup
    raise Exception("Hi")
Exception: Hi

ERROR:    Application startup failed. Exiting.

In the case of multiple workers, it just hangs forever.

Environment

OS / Python / Uvicorn version: Running uvicorn 0.14.0 with CPython 3.8.10 on Linux

The text was updated successfully, but these errors were encountered:

humrochagf · 2021-08-17T14:51:29Z

Oh, nice to know, I bumped into this issue while writing some tests to improve the coverage, I wasn't sure it was something expected or a bug, but I added a note to check it later when I have more time.

This change will probably require a minor bump instead of a patch, since someone can be relying on this hanging behaviour at their daemons. Even though the app isn't really up, careless configured daemons can peak processing by infinity loops of app reboots.

In my tests, I noticed it hangs even sooner at the startup process, for example, if you miss-configure the app.

It fails to load the app and hangs instead of exiting:

➜ uvicorn test --workers 2 
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started parent process [15068]
ERROR:    Error loading ASGI app. Import string "test" must be in format "<module>:<attribute>".
ERROR:    Error loading ASGI app. Import string "test" must be in format "<module>:<attribute>".

The same for the reload case:

➜ uvicorn test --reload   
INFO:     Will watch for changes in these directories: ['/home/user/tests/server']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [15234] using watchgod
ERROR:    Error loading ASGI app. Import string "test" must be in format "<module>:<attribute>".

humrochagf · 2021-08-20T21:10:19Z

More info on the case, on my tests passing the wrong name the workers dies at the following lines of the configuration loading:

https://github.com/encode/uvicorn/blob/master/uvicorn/config.py#L449-L451

The problem with crashes that happen inside the configuration loading is that the lifetime communication with the parent process isn't set up at this moment, so the parent never gets notified.

Trying to make some notifications after the setup of the lifetime doesn't seem to have much effect either:

https://github.com/encode/uvicorn/blob/master/uvicorn/server.py#L77-L79

Do you know the proper way to communicate a shutdown event to the parent process? I couldn't find a good IPC example at the codebase to try out in this situation.

Kludex · 2021-08-21T11:40:58Z

I would check how gunicorn and hypercorn are doing it.

humrochagf · 2021-08-29T23:39:38Z

For more info on the case, I checked both Hypercorn and Gunicorn. In Hypercorn it seems to have a similar issue to the one described here.

For Gunicorn it works perfectly, but there's a difference in the way they handle the workers.
Gunicorn has an Arbiter that is responsible to manage and keep the workers alive, and it maintains a tight control of the worker's state.

One thing that could be brought from there is to make sure all lifespan setup happens during the spawning process of the worker instead of having parts of its loading happening during the call to serve. That way you make sure you have a communication line with the worker set up to be able to close the parent if something happens during the startup.

Kludex · 2021-09-30T07:57:16Z

The reload part is fixed, but not the multiple workers one.

danieldaeschle · 2021-12-01T08:10:29Z

Is there any update on this? I have the same problem when the app crashes, Kubernetes does not recognize it and also does not restart it.

Maybe a workaround?

Kludex · 2021-12-01T09:11:06Z

The fix on the reload is on máster, we need to release it.

danieldaeschle · 2021-12-01T09:34:24Z

I need it for multiple workers. I don't use reload on production.

Kludex · 2021-12-01T10:11:08Z

You shouldn't be using uvicorn's --workers feature in production. The recommendation is to use standalone uvicorn on k8s without workers, and let the management on a higher level, but if you still want to have multiple workers, you should be using gunicorn.

In any case, a workaround can be found on the first commit of #1177.

danieldaeschle · 2021-12-01T11:07:40Z

Thanks for that hint

Kludex · 2022-01-25T08:19:35Z

It's fine to fail and don't exit on reload. The problem on the --workers persists.

Korijn · 2022-02-02T14:26:39Z

Just want to point out that this is still an issue on 0.17.0.post1.

Kludex · 2022-02-02T14:29:59Z

The conclusion here is that it's fine to fail and don't exit on reload. It's a feature.

Korijn · 2022-02-02T15:03:35Z

Well... I don't understand, it doesn't recover and is stuck in that state permanently. To be clear, I'm speaking to the reload scenario described in the opening post, not the workers scenario.

What part of this is a feature? Does it help me debug in some way?

Korijn · 2022-02-03T14:53:40Z

So, to be completely clear. I'm running the reload example from the top of this issue:

Same outcome as before. Hangs forever. Is that really the accepted situation?

Kludex · 2022-02-03T14:54:16Z

Yes.

Kludex · 2022-05-15T17:25:10Z

Just to give more details... There are two ways to see this problem:

We don't reload the application when an exception happens at startup, and we exit.
We reload the application when an exception happens at startup, and we see the process waiting for updates.

The thing is that we have a better user experience with 2.

In other words, there's no issue with reload in that sense. But it's a problem on the workers side.

Well... I'm closing this because I don't think people are using workers in production, but if someone wants to fix the workers side, feel free to open a PR...

Kludex mentioned this issue Aug 22, 2021

onShutdown not trigger when shutdown Gunicorn process (Uvicorn worker class) #1169

Closed

humrochagf mentioned this issue Sep 6, 2021

Add check if childs are still alive and shutdown instead of hanging #1177

Merged

Kludex added this to the Version 0.16.0 milestone Sep 22, 2021

Kludex removed this from the Version 0.16.0 milestone Sep 29, 2021

Kludex closed this as completed in #1177 Sep 30, 2021

Kludex reopened this Sep 30, 2021

Kludex mentioned this issue Oct 2, 2021

Create ProcessManager #1205

Closed

Kludex mentioned this issue Dec 4, 2021

Ensure non-zero exit code when startup fails #1278

Merged

Kludex mentioned this issue Jan 8, 2022

Fix reload process behavior when exception is raised #1313

Merged

Kludex changed the title ~~[BUG] Exit at startup failure on multiple workers and reload~~ [BUG] Exit at startup failure on multiple workers Feb 2, 2022

Kludex closed this as completed May 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Exit at startup failure on multiple workers #1115

[BUG] Exit at startup failure on multiple workers #1115

Kludex commented Jul 10, 2021

humrochagf commented Aug 17, 2021

humrochagf commented Aug 20, 2021

Kludex commented Aug 21, 2021

humrochagf commented Aug 29, 2021

Kludex commented Sep 30, 2021

danieldaeschle commented Dec 1, 2021 •

edited

Loading

Kludex commented Dec 1, 2021

danieldaeschle commented Dec 1, 2021

Kludex commented Dec 1, 2021

danieldaeschle commented Dec 1, 2021

Kludex commented Jan 25, 2022

Korijn commented Feb 2, 2022

Kludex commented Feb 2, 2022

Korijn commented Feb 2, 2022

Korijn commented Feb 3, 2022

Kludex commented Feb 3, 2022

Kludex commented May 15, 2022

[BUG] Exit at startup failure on multiple workers #1115

[BUG] Exit at startup failure on multiple workers #1115

Comments

Kludex commented Jul 10, 2021

Checklist

Describe the bug

To reproduce

Expected behavior

Actual behavior

Environment

humrochagf commented Aug 17, 2021

humrochagf commented Aug 20, 2021

Kludex commented Aug 21, 2021

humrochagf commented Aug 29, 2021

Kludex commented Sep 30, 2021

danieldaeschle commented Dec 1, 2021 • edited Loading

Kludex commented Dec 1, 2021

danieldaeschle commented Dec 1, 2021

Kludex commented Dec 1, 2021

danieldaeschle commented Dec 1, 2021

Kludex commented Jan 25, 2022

Korijn commented Feb 2, 2022

Kludex commented Feb 2, 2022

Korijn commented Feb 2, 2022

Korijn commented Feb 3, 2022

Kludex commented Feb 3, 2022

Kludex commented May 15, 2022

danieldaeschle commented Dec 1, 2021 •

edited

Loading