-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Ray distributed backend does not support out-of-tree models via ModelRegistry APIs #5657
Comments
will take a look later |
sorry i don't get it. the usage of oot model registration, is that you register the architecture name appearing in the huggingface config file, not the see https://huggingface.co/facebook/opt-125m/blob/main/config.json#L6 for example. |
Yes, this is how I am using it. from vllm import ModelRegistry
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM)
from vllm import LLM, SamplingParams
if __name__ == "__main__":
llm = LLM(
model="path_to_directory/", # directory which has a config.json with architectures: ["SomeModel"]
tensor_parallel_size=8,
# distributed_executor_backend="ray", # ray backend fails!
) |
then it makes sense to me. from vllm import ModelRegistry
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM) is not executed in ray workers. |
thanks! |
@SamKG so the default backend (multiprocessing) should work out-of-the-box, right? |
@richardliaw Try attached. Note that the default backend will also fail (but with an expected error), since I added a stub tensor to keep the model directory small. @youkaichao yes, default backend works fine (as long as the OOT definition happens outside of main) |
ray.init(runtime_env={"worker_process_setup_hook": })... allows to execute code on all workers. Would this suffice? |
@rkooo567 this functionality seems related, but how can we expose it to users? |
this seems to fix the issue! import ray
from vllm import ModelRegistry, LLM
def _init_worker():
from vllm.model_executor.models.mixtral import MixtralForCausalLM
ModelRegistry.register_model("SomeModel", MixtralForCausalLM)
_init_worker()
if __name__ == "__main__":
ray.init(runtime_env={"worker_process_setup_hook": _init_worker})
llm = LLM(
model="model/",
tensor_parallel_size=8,
distributed_executor_backend="ray",
)
llm.generate("test") |
very nice! @youkaichao maybe we can just print out a warning linking to the vllm docs about this? and in the vllm docs let's have an example snippet like above! |
Is there any solution to the multi tensor parallel plugin method for theOOT model problem (except using ray backend)? I can't find the relevant documentation for the plugin. |
@chensiye-csy sorry I didn't have time to write doc yet, but you can follow https://github.com/vllm-project/vllm/tree/main/tests/plugins/vllm_add_dummy_model . it is fairly easy. |
In fact, I have registered the OOT model according to the |
yes this is exactly the problem plugin will solve. the plugin function will be called in every vllm process. |
That's good! How can I use plugin in OOT model? |
the easiest way: run
|
Your current environment
🐛 Describe the bug
The ray distributed backend does not support out-of-tree models (on a single node).
Repro:
The text was updated successfully, but these errors were encountered: