Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ProcessManager #1205

Closed
wants to merge 28 commits into from
Closed

Create ProcessManager #1205

wants to merge 28 commits into from

Conversation

Kludex
Copy link
Member

@Kludex Kludex commented Oct 2, 2021

It's a draft, but comments about the NOTES.md are welcome. :)
Reviews are welcome.

Summary

The idea of this PR is to get a process manager that will kill and spawn the child processes as needed.

The issue #517 is the first that introduced the idea.

Changes

Decisions

  • should_exit and force_exit aren't the only ones controlling the shutdown anymore. I've also added terminate_called inspired by this article, and after further thinking, it does make sense. As the signal will be propagated to the workers (and the master wants to let the worker know it should terminate). - This is actually a bug that happens pre this PR as you can see on [BUG] Received duplicated signal from Gunicorn #1116, but I've just come with this solution when creating this PR, which means I can also create another PR for it.

Issues

Posterior work

  • Worker health check.
  • Accept --pidfile.
  • Accept --graceful-timeout.
  • Add further signal handling.

Q&A that I've come across during development

What happens if the startup event fails on a single worker?

The manager is notified and then terminate gracefully the workers i.e. it will give time to the other workers to perform the shutdown event.

Should we have an "age" attribute on the manager?

Age is used to choose which worker to kill.

Why is there a random delay on gunicorn spawn_workers function?

IDK

Should we implement the maybe_promote_worker logic? With USR2 signal?

Yes! References:
- https://docs.gunicorn.org/en/stable/signals.html#upgrading-to-a-new-binary-on-the-fly
- benoitc/gunicorn#1267

On the signal() method on the Arbiter, why is there a limit of 5 signals on the queue?

IDK

Do I need to have a health check? I see that on gunicorn, a temporary file is written from time to time by the workers, so the manager is able to detect if the workers are alive or not.

Yes, we need to have a health check. Not sure if we can use a PIPE instead of a temporary file here.

Do we need KeyboardInterruption exception catch on the main loop?

IDK

References

Design highly inspired by Gunicorn! Reviewers can use this to help.

@Kludex Kludex changed the title Add ProcessManager Create ProcessManager Oct 2, 2021
@Kludex Kludex marked this pull request as ready for review October 3, 2021 18:37

def reap_processes(self) -> None:
for process in self.processes:
if not process.is_alive():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to further check if the wait() here is enough to reap! It worked on the cases I've tested, but I'm not sure if there are edge cases!

@Kludex
Copy link
Member Author

Kludex commented Oct 3, 2021

I still need to add tests, but I'd like reviews in the meantime. I think it has the minimal workable setup that I wanted, and the Posterior work section above list what I want to do next.

@@ -11,4 +11,9 @@
except ImportError: # pragma: no cover
from uvicorn.supervisors.statreload import StatReload as ChangeReload

if sys.platform == "win32":
from uvicorn.supervisors.multiprocess import Multiprocess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are plans to support Windows, I would be happy to help. (I read the code carefully, it is not difficult to support Windows)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem is with the signals not supported by Windows, what do you expect to do on that part? I just added the minimal signal handlers, but further work should not work.

Copy link
Member

@abersheeran abersheeran Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use process.terminate() replace SIGINT. Other signals are transmitted as they are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, a large number of other signals cannot be triggered on Windows. We can give Windows users a near-identical development experience as long as we can handle the signals raised by Ctrl-C in a unified manner.

Copy link
Member Author

@Kludex Kludex Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not referring to SIGINT, SIGINT and SIGTERM are not an issue, and I don't see any issue of calling process.terminate() on SIGINT. I mean, I don't think there's an issue on that part for windows users.

The issue right now is the init_signals() method, that adds a handler for signals that Windows doesn't have. If we just add a conditional to create SIGNALS attribute according to the OS, then it will work. The issue is with the SIGCHLD, that Windows doesn't support.

Copy link
Member

@abersheeran abersheeran Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue right now is the init_signals() method, that adds a handler for signals that Windows doesn't have. If we just add a conditional to create SIGNALS attribute according to the OS, then it will work. The issue is with the SIGCHLD, that Windows doesn't support.

https://github.com/Kludex/uvicorn/blob/a4377a14e3360fa487c506b49e8c5537fb0057a1/uvicorn/supervisors/manager.py#L71-L74

Maybe don't need SIGCHLD in Windows? The delay of 0.25 is not high.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "development experience" I imagine the user wanting to use the reload feature, which should not be recommended here? 🤔

I mean, if the user wants to use the class implemented here in production, I imagine that it also wants to use the reload feature in development.

Yes, I agree. It is a good idea to incorporate the reload here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not referring to SIGINT, SIGINT and SIGTERM are not an issue, and I don't see any issue of calling process.terminate() on SIGINT. I mean, I don't think there's an issue on that part for windows users.

#684 Sometime, os.kill() maybe got an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Kludex/uvicorn/blob/a4377a14e3360fa487c506b49e8c5537fb0057a1/uvicorn/supervisors/manager.py#L71-L74

Maybe don't need SIGCHLD in Windows? The delay of 0.25 is not high.

I see! You mean that it's redundant to have the handler as we already perform the same logic in case we don't handle the SIGCHLD.

I need to recheck if there's further logic to be implemented on the SIGCHLD handler that I didn't add because it's not needed on this initial step. If I can't find anything, then yes, we can remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. It is a good idea to incorporate the reload here.

I was not defending the idea of adding the reload here, as I think it's a posterior discussion. But I do agree with that, if we do that, we're going to have only this ProcessManager class, and both the Reload classes and the Multiprocess class should be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants