[Feature]: improve distributed backend selection #8683

youkaichao · 2024-09-20T22:54:29Z

🚀 The feature, motivation and pitch

We have three ways to start a new process:

multiprocessing by fork
multiprocessing by spawn
ray

by default, we use ray for multi-node serving, and multiprocessing by fork for single node setting.

however, if users initialize cuda context, multiprocessing by fork will not work.

if we set multiprocessing by spawn by default, it will not work when users don't have if __name__ == "__main__".

if we can figure out whether users have if __name__ == "__main__" automatically, we can improve the default user experience.

the proposed solution is:

if we find that cuda is initialized, we inspect the current function call stack, and trace back the stack until we reach the __main__ module, check the current line to see if we are under if __name__ == "__main__", if yes, switch the multiprocessing method from fork to spawn.

cc @russellb

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

russellb · 2024-09-23T21:14:37Z

if we find that cuda is initialized, we inspect the current function call stack, and trace back the stack until we reach the main module, check the current line to see if we are under if name == "main", if yes, switch the multiprocessing method from fork to spawn.

@youkaichao I looked into this a bit, and it doesn't seem straightforward. Let me know if you have a particular method in mind.

Another idea:

Set the default to spawn when the vllm CLI entry point is used. In that case, we know we are in full control, so we can set the default.
Add detection for the case where cuda is already initialized and the method is not spawn. In this case, add a warning log message to give clear instructions on what should be changed.
Expand developer documentation on the topic for those using the Python APIs.

(for reference, this was a related PR where I was changing the default: #8576)

youkaichao · 2024-09-23T21:48:47Z

Add detection for the case where cuda is already initialized and the method is not spawn. In this case, add a warning log message to give clear instructions on what should be changed.

then maybe you can add this. I think it can be enough.

russellb · 2024-09-24T12:52:36Z

OK - you can assign this to me.

russellb · 2024-09-25T20:20:31Z

I thought this was interesting. While testing out the behavior of different scenarios, I see that Python already does a nice job of detecting the case where spawn is used when the parent code isn't protected by if __name__ == "__main__".

...
  File "/usr/lib64/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html

youkaichao added the feature request label Sep 20, 2024

youkaichao assigned russellb Sep 24, 2024

russellb mentioned this issue Sep 25, 2024

[Core] Improve choice of Python multiprocessing method #8823

Merged

DarkLight1337 closed this as completed in #8823 Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: improve distributed backend selection #8683

[Feature]: improve distributed backend selection #8683

youkaichao commented Sep 20, 2024

russellb commented Sep 23, 2024

youkaichao commented Sep 23, 2024

russellb commented Sep 24, 2024

russellb commented Sep 25, 2024

[Feature]: improve distributed backend selection #8683

[Feature]: improve distributed backend selection #8683

Comments

youkaichao commented Sep 20, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

russellb commented Sep 23, 2024

youkaichao commented Sep 23, 2024

russellb commented Sep 24, 2024

russellb commented Sep 25, 2024