Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exit at startup failure on multiple workers #1115

Closed
2 tasks done
Kludex opened this issue Jul 10, 2021 · 17 comments · Fixed by #1177
Closed
2 tasks done

[BUG] Exit at startup failure on multiple workers #1115

Kludex opened this issue Jul 10, 2021 · 17 comments · Fixed by #1177

Comments

@Kludex
Copy link
Member

Kludex commented Jul 10, 2021

Checklist

  • The bug is reproducible against the latest release and/or master.
  • There are no similar issues or pull requests to fix it yet.

Describe the bug

When the startup fails, either on reload mode or multiple workers, the main process is not terminated.

To reproduce

Application:

# test.py
from fastapi import FastAPI

app = FastAPI()

@app.on_event("startup")
def startup():
    raise Exception("Hi")
uvicorn test:app --workers 2
# or
uvicorn test:app --reload

Expected behavior

All the processes (parent and children) should be terminated.

Actual behavior

❯ uvicorn test:app --reload
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [40026] using watchgod
INFO:     Started server process [40028]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 540, in lifespan
    async for item in self.lifespan_context(app):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 481, in default_lifespan
    await self.startup()
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 518, in startup
    handler()
  File "/home/marcelo/Development/./test.py", line 7, in startup
    raise Exception("Hi")
Exception: Hi

ERROR:    Application startup failed. Exiting.

After this log, it hangs forever. On the reload, it doesn't prevent the failure from happening multiple times if you modify a watched file. See:

❯ uvicorn test:app --reload            
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [40674] using watchgod
INFO:     Started server process [40676]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 540, in lifespan
    async for item in self.lifespan_context(app):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 481, in default_lifespan
    await self.startup()
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 518, in startup
    handler()
  File "/home/marcelo/Development/./test.py", line 7, in startup
    raise Exception("Hi")
Exception: Hi

ERROR:    Application startup failed. Exiting.
WARNING:  WatchGodReload detected file change in '['/home/marcelo/Development/test.py']'. Reloading...
INFO:     Started server process [40734]
INFO:     Waiting for application startup.
ERROR:    Traceback (most recent call last):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 540, in lifespan
    async for item in self.lifespan_context(app):
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 481, in default_lifespan
    await self.startup()
  File "/home/marcelo/anaconda3/envs/uvicorn/lib/python3.8/site-packages/starlette/routing.py", line 518, in startup
    handler()
  File "/home/marcelo/Development/./test.py", line 7, in startup
    raise Exception("Hi")
Exception: Hi

ERROR:    Application startup failed. Exiting.

In the case of multiple workers, it just hangs forever.

Environment

  • OS / Python / Uvicorn version: Running uvicorn 0.14.0 with CPython 3.8.10 on Linux
@humrochagf
Copy link
Contributor

Oh, nice to know, I bumped into this issue while writing some tests to improve the coverage, I wasn't sure it was something expected or a bug, but I added a note to check it later when I have more time.

This change will probably require a minor bump instead of a patch, since someone can be relying on this hanging behaviour at their daemons. Even though the app isn't really up, careless configured daemons can peak processing by infinity loops of app reboots.

In my tests, I noticed it hangs even sooner at the startup process, for example, if you miss-configure the app.

It fails to load the app and hangs instead of exiting:

➜ uvicorn test --workers 2 
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started parent process [15068]
ERROR:    Error loading ASGI app. Import string "test" must be in format "<module>:<attribute>".
ERROR:    Error loading ASGI app. Import string "test" must be in format "<module>:<attribute>".

The same for the reload case:

➜ uvicorn test --reload   
INFO:     Will watch for changes in these directories: ['/home/user/tests/server']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [15234] using watchgod
ERROR:    Error loading ASGI app. Import string "test" must be in format "<module>:<attribute>".

@humrochagf
Copy link
Contributor

More info on the case, on my tests passing the wrong name the workers dies at the following lines of the configuration loading:

https://github.com/encode/uvicorn/blob/master/uvicorn/config.py#L449-L451

The problem with crashes that happen inside the configuration loading is that the lifetime communication with the parent process isn't set up at this moment, so the parent never gets notified.

Trying to make some notifications after the setup of the lifetime doesn't seem to have much effect either:

https://github.com/encode/uvicorn/blob/master/uvicorn/server.py#L77-L79

Do you know the proper way to communicate a shutdown event to the parent process? I couldn't find a good IPC example at the codebase to try out in this situation.

@Kludex
Copy link
Member Author

Kludex commented Aug 21, 2021

I would check how gunicorn and hypercorn are doing it.

@humrochagf
Copy link
Contributor

For more info on the case, I checked both Hypercorn and Gunicorn. In Hypercorn it seems to have a similar issue to the one described here.

For Gunicorn it works perfectly, but there's a difference in the way they handle the workers.
Gunicorn has an Arbiter that is responsible to manage and keep the workers alive, and it maintains a tight control of the worker's state.

One thing that could be brought from there is to make sure all lifespan setup happens during the spawning process of the worker instead of having parts of its loading happening during the call to serve. That way you make sure you have a communication line with the worker set up to be able to close the parent if something happens during the startup.

@Kludex
Copy link
Member Author

Kludex commented Sep 30, 2021

The reload part is fixed, but not the multiple workers one.

@Kludex Kludex reopened this Sep 30, 2021
@danieldaeschle
Copy link

danieldaeschle commented Dec 1, 2021

Is there any update on this? I have the same problem when the app crashes, Kubernetes does not recognize it and also does not restart it.

Maybe a workaround?

@Kludex
Copy link
Member Author

Kludex commented Dec 1, 2021

The fix on the reload is on máster, we need to release it.

@danieldaeschle
Copy link

I need it for multiple workers. I don't use reload on production.

@Kludex
Copy link
Member Author

Kludex commented Dec 1, 2021

You shouldn't be using uvicorn's --workers feature in production. The recommendation is to use standalone uvicorn on k8s without workers, and let the management on a higher level, but if you still want to have multiple workers, you should be using gunicorn.

In any case, a workaround can be found on the first commit of #1177.

@danieldaeschle
Copy link

Thanks for that hint

@Kludex
Copy link
Member Author

Kludex commented Jan 25, 2022

It's fine to fail and don't exit on reload. The problem on the --workers persists.

@Korijn
Copy link

Korijn commented Feb 2, 2022

Just want to point out that this is still an issue on 0.17.0.post1.

@Kludex
Copy link
Member Author

Kludex commented Feb 2, 2022

The conclusion here is that it's fine to fail and don't exit on reload. It's a feature.

@Kludex Kludex changed the title [BUG] Exit at startup failure on multiple workers and reload [BUG] Exit at startup failure on multiple workers Feb 2, 2022
@Korijn
Copy link

Korijn commented Feb 2, 2022

Well... I don't understand, it doesn't recover and is stuck in that state permanently. To be clear, I'm speaking to the reload scenario described in the opening post, not the workers scenario.

What part of this is a feature? Does it help me debug in some way?

@Korijn
Copy link

Korijn commented Feb 3, 2022

So, to be completely clear. I'm running the reload example from the top of this issue:

image

image

image

Same outcome as before. Hangs forever. Is that really the accepted situation?

@Kludex
Copy link
Member Author

Kludex commented Feb 3, 2022

Yes.

@Kludex
Copy link
Member Author

Kludex commented May 15, 2022

Just to give more details... There are two ways to see this problem:

  1. We don't reload the application when an exception happens at startup, and we exit.
  2. We reload the application when an exception happens at startup, and we see the process waiting for updates.

The thing is that we have a better user experience with 2.

In other words, there's no issue with reload in that sense. But it's a problem on the workers side.

Well... I'm closing this because I don't think people are using workers in production, but if someone wants to fix the workers side, feel free to open a PR...

@Kludex Kludex closed this as completed May 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants