Support AMD (Rocm kernel) #360

yushengsu-thu · 2025-02-24T06:30:08Z

This code base supports AMD GPUs
[Bugs Free]

[Done] Fix (AMD) torch and Ray issue
[Done] Add the conditions to enable (AMD) torch in the codebase

[Pass test cases] run_qwen2-7b_seq_balance.sh, run_qwen2-7b_rm_seq_balance.sh

[Done] Convergence test done
[Done] Throughput test done

Tutorial:

[Done] AMD-doc

Special thanks for the collaboration and help from:
(SGLang) @zhaochenyang20
(VeRL) @PeterSH6
(AMD) @yushengsu-thu @vickytsang @xiaodoyu
(AnyScale) @hongpeng-guo @kevin85421

PeterSH6 · 2025-02-24T07:55:04Z

.gitignore

@@ -1,3 +1,13 @@
+data


These shall be deleted?

PeterSH6 · 2025-02-24T11:39:27Z

setup_amd.py

+    author='Bytedance - Seed - MLSys',
+    author_email='zhangchi.usc1992@bytedance.com, gmsheng@connect.hku.hk',
+    description='veRL: Volcano Engine Reinforcement Learning for LLM',
+    # install_requires=install_requires,


Just wondering if the setup file is ready? It seems that it didn't load the requirement_amd.txt

@PeterSH6 Guangming
If users want to run on AMD GPUs, they should follow this AMD-doc to install the VeRL.

Setup the Environment

pip install -r requirements_amd_no_deps.txt --no-deps pip install -r requirements_amd.txt

I was thinking of merging this manual installation process into setup.py and make user can use command pip install -e . to finish installing:
(1) I can modify setup.py to have two parts: (1.1) detects AMD Torch and follows the installation process I provide, (1.2) and another that executes the original process in setup.py if AMD Torch is not detected.

(2) I previously tried to modify setup.py as described in (1), but it still executed the original setup.py. It seems likely that I also need to make some modifications to pyproject.toml to enable (1).

Do you think (1) is a good idea? Also, I’m not very familiar with solving (2). If you have time, could we quickly sync offline to figure out how to handle this part?

PeterSH6 · 2025-02-24T11:41:40Z

verl/third_party/vllm/__init__.py

+    vllm_version = '0.6.4'
+    from .vllm_v_0_6_4_rocm624.llm import LLM
+    from .vllm_v_0_6_4_rocm624.llm import LLMEngine
+    from .vllm_v_0_6_4_rocm624 import parallel_state


It seems that the relevant Rocm vllm is not uploaded yet.
I wonder if AMD supports vllm > 0.7 at the moment?

@PeterSH6. Guangming

It was ignored. Let me add it back. (You can see it now)

At the same time, I'm testing vLLM > 0.7 because AMD seems to have not released a vLLM > 0.7 ready version. I am figuring out this and working with the AMD vLLM team now.

I compiled a vllm==0.7.3 on MI300, and tried it on verl. It works if I disable the sleep feature of vllm. But vllm has not supported sleep mode on amd devices.

About AMD sleep mode support

zhaochenyang20 · 2025-02-24T23:35:29Z

Nice work!

CLAassistant · 2025-02-26T00:32:05Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

zhaochenyang20 · 2025-02-26T02:51:31Z

@yushengsu-thu conflicts need to solve

Support AMD (Rocm kernel)

4a93893

PeterSH6 reviewed Feb 24, 2025

View reviewed changes

.gitignore

@@ -1,3 +1,13 @@

data

Copy link

Collaborator

PeterSH6 Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shall be deleted?

PeterSH6 reviewed Feb 24, 2025

View reviewed changes

fix 3rd party update

225f4cc

yushengsu-thu added 3 commits February 24, 2025 16:43

upload rocm vllm

51121c9

update tutorial

072b4b5

fix amd tutorial

5d277e1

add the throghput:Tokens/Sec/GPU

566c0c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support AMD (Rocm kernel) #360

Support AMD (Rocm kernel) #360

yushengsu-thu commented Feb 24, 2025 •

edited

Loading

PeterSH6 Feb 24, 2025

PeterSH6 Feb 24, 2025

yushengsu-thu Feb 24, 2025 •

edited

Loading

PeterSH6 Feb 24, 2025

yushengsu-thu Feb 24, 2025 •

edited

Loading

CZWin32768 Feb 25, 2025

zhaochenyang20 commented Feb 24, 2025

CLAassistant commented Feb 26, 2025

zhaochenyang20 commented Feb 26, 2025

Support AMD (Rocm kernel) #360

Are you sure you want to change the base?

Support AMD (Rocm kernel) #360

Conversation

yushengsu-thu commented Feb 24, 2025 • edited Loading

PeterSH6 Feb 24, 2025

Choose a reason for hiding this comment

PeterSH6 Feb 24, 2025

Choose a reason for hiding this comment

yushengsu-thu Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Setup the Environment

PeterSH6 Feb 24, 2025

Choose a reason for hiding this comment

yushengsu-thu Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

CZWin32768 Feb 25, 2025

Choose a reason for hiding this comment

zhaochenyang20 commented Feb 24, 2025

CLAassistant commented Feb 26, 2025

zhaochenyang20 commented Feb 26, 2025

yushengsu-thu commented Feb 24, 2025 •

edited

Loading

yushengsu-thu Feb 24, 2025 •

edited

Loading

yushengsu-thu Feb 24, 2025 •

edited

Loading