You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with a flask WSGI application which, on startup, loads various pieces of data from a database.
For example, it loads a list of laboratory numbers, which may run into many millions, to allow fast in-ram lookups.
The challenge I am facing is that with startup commands like gunicorn wsgi:app --bind 0.0.0.0:5000 --workers 8 --timeout 30
when multiple workers boot, they all request data from the database at the same time, and network/database bottlenecks result in a slow startup - so slow that some (sometimes all) workers are killed due to not becoming responsive ('alive') quick enough.
The gunicorn overseeing process then restarts the crashed workers, and the cycle repeats.
In its most extreme form, as the amount of material loaded from the database rises, no worker ever starts.
A workaround is clearly to increase the --timeout gunicorn wsgi:app --bind 0.0.0.0:5000 --workers 8 --timeout 90
and this works in the short term but this is not a good long term solution, nor is it necessarily generalisable across (slower?) computing environments.
What I was wondering was whether there was, or was a sensible use case to produce, a switch --start_incrementally or similar which
only started the n+1 th worker when the nth one had successfully launched, crashed, or timed out.
I appreciate this might cause all kinds of complexities but would be interested in views and in other possible solutions to the issue described.
Thank you.
The text was updated successfully, but these errors were encountered:
... or one of the other worker events, depending on how your application initialises.
This would give you much more fine-grained control over how to stagger worker forks ... the call should block, so you can either serialize worker-creation (e.g. by adding a sleep), or implement some other form of queueing or throttling.
Edit: actually, it is wise to pay attention to this comment: #2693 (comment) , since pre_fork event is of course pre_fork, so blocking the arbiter even during startup could have unintended consequences.
Other events such as post_fork are in the same boat - I don't know that there's an event that could be used to "slow down" or otherwise serialise application start-up ... couldn't you best orchestrate that in your application?
Hi,
I am working with a flask WSGI application which, on startup, loads various pieces of data from a database.
For example, it loads a list of laboratory numbers, which may run into many millions, to allow fast in-ram lookups.
The challenge I am facing is that with startup commands like
gunicorn wsgi:app --bind 0.0.0.0:5000 --workers 8 --timeout 30
when multiple workers boot, they all request data from the database at the same time, and network/database bottlenecks result in a slow startup - so slow that some (sometimes all) workers are killed due to not becoming responsive ('alive') quick enough.
The gunicorn overseeing process then restarts the crashed workers, and the cycle repeats.
In its most extreme form, as the amount of material loaded from the database rises, no worker ever starts.
A workaround is clearly to increase the --timeout
gunicorn wsgi:app --bind 0.0.0.0:5000 --workers 8 --timeout 90
and this works in the short term but this is not a good long term solution, nor is it necessarily generalisable across (slower?) computing environments.
What I was wondering was whether there was, or was a sensible use case to produce, a switch --start_incrementally or similar which
only started the n+1 th worker when the nth one had successfully launched, crashed, or timed out.
I appreciate this might cause all kinds of complexities but would be interested in views and in other possible solutions to the issue described.
Thank you.
The text was updated successfully, but these errors were encountered: