-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AMD (Rocm kernel) #360
base: main
Are you sure you want to change the base?
Conversation
@@ -1,3 +1,13 @@ | |||
data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shall be deleted?
author='Bytedance - Seed - MLSys', | ||
author_email='zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk', | ||
description='veRL: Volcano Engine Reinforcement Learning for LLM', | ||
# install_requires=install_requires, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if the setup file is ready? It seems that it didn't load the requirement_amd.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterSH6 Guangming
If users want to run on AMD GPUs, they should follow this AMD-doc to install the VeRL.
Setup the Environment
pip install -r requirements_amd_no_deps.txt --no-deps
pip install -r requirements_amd.txt
I was thinking of merging this manual installation process into setup.py
and make user can use command pip install -e .
to finish installing:
(1) I can modify setup.py
to have two parts: (1.1) detects AMD Torch and follows the installation process I provide, (1.2) and another that executes the original process in setup.py if AMD Torch is not detected.
(2) I previously tried to modify setup.py
as described in (1), but it still executed the original setup.py. It seems likely that I also need to make some modifications to pyproject.toml to enable (1).
Do you think (1) is a good idea? Also, I’m not very familiar with solving (2). If you have time, could we quickly sync offline to figure out how to handle this part?
verl/third_party/vllm/__init__.py
Outdated
vllm_version = '0.6.4' | ||
from .vllm_v_0_6_4_rocm624.llm import LLM | ||
from .vllm_v_0_6_4_rocm624.llm import LLMEngine | ||
from .vllm_v_0_6_4_rocm624 import parallel_state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the relevant Rocm vllm is not uploaded yet.
I wonder if AMD supports vllm > 0.7 at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PeterSH6. Guangming
- It was ignored. Let me add it back. (You can see it now)
- At the same time, I'm testing vLLM > 0.7 because AMD seems to have not released a vLLM > 0.7 ready version. I am figuring out this and working with the AMD vLLM team now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compiled a vllm==0.7.3 on MI300, and tried it on verl. It works if I disable the sleep feature of vllm. But vllm has not supported sleep mode on amd devices.
Nice work! |
|
@yushengsu-thu conflicts need to solve |
This code base supports AMD GPUs
[Bugs Free]
[Pass test cases]
run_qwen2-7b_seq_balance.sh
,run_qwen2-7b_rm_seq_balance.sh
Tutorial:
Special thanks for the collaboration and help from:
(SGLang) @zhaochenyang20
(VeRL) @PeterSH6
(AMD) @yushengsu-thu @vickytsang @xiaodoyu
(AnyScale) @hongpeng-guo @kevin85421