-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: Is it possible to avoid ray in single machine multiple GPUs serving? #391
Comments
At least ray is needed if you want to use tensor parallel with multiple GPUS, since each Worker instance should exist in a single process but not thread. However, we can just replace ray with multiprocess in this regard. I haven't seen the other reason why we need ray in the code, maybe there are something, for example, memory issue, object sharing or some other stuff. |
Closing because it appears Ray is only used if:
|
Upstream merge 25 01 27
I'm uncertain whether it's feasible to bypass Ray when serving on a single machine with multiple GPUs. Ray introduces additional maintenance costs in this use case.
The text was updated successfully, but these errors were encountered: