Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support AMD (Rocm kernel) #360

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

yushengsu-thu
Copy link

@yushengsu-thu yushengsu-thu commented Feb 24, 2025

This code base supports AMD GPUs
[Bugs Free]

  • [Done] Fix (AMD) torch and Ray issue
  • [Done] Add the conditions to enable (AMD) torch in the codebase

[Pass test cases] run_qwen2-7b_seq_balance.sh, run_qwen2-7b_rm_seq_balance.sh

  • [Done] Convergence test done
  • [Done] Throughput test done

Tutorial:

Special thanks for the collaboration and help from:
(SGLang) @zhaochenyang20
(VeRL) @PeterSH6
(AMD) @yushengsu-thu @vickytsang @xiaodoyu
(AnyScale) @hongpeng-guo @kevin85421

@@ -1,3 +1,13 @@
data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shall be deleted?

author='Bytedance - Seed - MLSys',
author_email='zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk',
description='veRL: Volcano Engine Reinforcement Learning for LLM',
# install_requires=install_requires,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if the setup file is ready? It seems that it didn't load the requirement_amd.txt

Copy link
Author

@yushengsu-thu yushengsu-thu Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterSH6 Guangming
If users want to run on AMD GPUs, they should follow this AMD-doc to install the VeRL.

Setup the Environment

pip install -r requirements_amd_no_deps.txt --no-deps
pip install -r requirements_amd.txt

I was thinking of merging this manual installation process into setup.py and make user can use command pip install -e . to finish installing:
(1) I can modify setup.py to have two parts: (1.1) detects AMD Torch and follows the installation process I provide, (1.2) and another that executes the original process in setup.py if AMD Torch is not detected.

(2) I previously tried to modify setup.py as described in (1), but it still executed the original setup.py. It seems likely that I also need to make some modifications to pyproject.toml to enable (1).

Do you think (1) is a good idea? Also, I’m not very familiar with solving (2). If you have time, could we quickly sync offline to figure out how to handle this part?

vllm_version = '0.6.4'
from .vllm_v_0_6_4_rocm624.llm import LLM
from .vllm_v_0_6_4_rocm624.llm import LLMEngine
from .vllm_v_0_6_4_rocm624 import parallel_state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the relevant Rocm vllm is not uploaded yet.
I wonder if AMD supports vllm > 0.7 at the moment?

Copy link
Author

@yushengsu-thu yushengsu-thu Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterSH6. Guangming

  1. It was ignored. Let me add it back. (You can see it now)
  2. At the same time, I'm testing vLLM > 0.7 because AMD seems to have not released a vLLM > 0.7 ready version. I am figuring out this and working with the AMD vLLM team now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compiled a vllm==0.7.3 on MI300, and tried it on verl. It works if I disable the sleep feature of vllm. But vllm has not supported sleep mode on amd devices.

About AMD sleep mode support

@zhaochenyang20
Copy link

Nice work!

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@zhaochenyang20
Copy link

@yushengsu-thu conflicts need to solve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants