🧗 Add GRPO Trainer support for third-party accelerators #2836

ji-huazhong · 2025-02-12T05:48:18Z

What does this PR do?

This PR makes GRPO Trainer out of the box on Ascend NPUs.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Superskyyy · 2025-02-12T13:58:59Z

Since the vllm device patch is growing larger. It might be wise to move them into a utility module instead. Wdyt.

baymax591 · 2025-02-14T10:54:03Z

This PR helps a lot, I hope it can speed up the integration

trl/trainer/grpo_trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

ji-huazhong · 2025-02-14T13:30:03Z

I think this PR is ready to be merged 🤗 @qgallouedec

HuggingFaceDocBuilderDev · 2025-02-14T13:52:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-02-14T14:23:07Z

Can you make sure sure to run make precommit to apply the style 🙏

ji-huazhong · 2025-02-18T14:56:20Z

make precommit is successfully executed locally

lynnzhiyun · 2025-02-18T15:28:50Z

Hi @ji-huazhong, Thank you for your excellent work! This PR has been incredibly helpful in enabling me to train models using GRPO on the NPU smoothly.

I want to ask if this PR is ready to be merged and I'd be extremely grateful if it could be done promptly.

cc @qgallouedec

trl/trainer/grpo_trainer.py

qgallouedec · 2025-02-18T15:56:36Z

trl/trainer/grpo_trainer.py

                # Check that the requested device is available
-                if vllm_device.split(":")[0] == "cuda" and int(vllm_device.split(":")[1]) >= torch.cuda.device_count():
+                if (
+                    vllm_device.split(":")[0] == f"{device_type}"


this should always be the case, no?

Hi @qgallouedec

Thanks for your review. In line 387，I maintained the same logic as orignal conditional statement，only repalcing the 'cuda' type with more general type.

I believe the check for device availability here is necessary. However, perhaps we could split this conditional statement into two parts.

First, we check if the device type matches, and only after this condition is met do we check if the device index is within the range of available devices. wdyt?

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

ji-huazhong · 2025-02-19T07:45:00Z

I did a test on Ascend NPU using the grpo script provided by open-r1, it works 🤗

Since training grpo for one step takes a long time, only the output of the first 4 steps is shown here, and then I just press ctrl-c to exit.

ji-huazhong · 2025-02-19T08:40:49Z

Hi @kashif, the failing test case seems unrelated to this PR. Could you take a look? Thanks!

qgallouedec

Thanks for iterating. I cannot test myself, but the new changes doesn't break the current support. Let's merge!

ji-huazhong · 2025-02-27T12:18:06Z

The failing use case is due to an error when accessing huggingface.co and has nothing to do with this PR. cc @qgallouedec

…2836) * Add GRPO Trainer support for Ascend NPU * 更新 grpo_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * code format * 更新 grpo_trainer.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * patch mem_get_info * stylre --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

ji-huazhong force-pushed the npu branch 3 times, most recently from 48739bc to 86c5569 Compare February 12, 2025 09:05

ji-huazhong changed the title ~~Add GRPO Trainer support for Ascend NPU~~ Add GRPO Trainer support for third-party accelerators Feb 13, 2025

ji-huazhong mentioned this pull request Feb 13, 2025

Simplified installation requirements to support more accelerators huggingface/open-r1#303

Open

qgallouedec approved these changes Feb 14, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

ji-huazhong and others added 2 commits February 14, 2025 21:18

Add GRPO Trainer support for Ascend NPU

7771f60

更新 grpo_trainer.py

df304ea

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

ji-huazhong force-pushed the npu branch from 596f539 to df304ea Compare February 14, 2025 13:27

ji-huazhong added 2 commits February 15, 2025 01:04

code format

622301f

Merge branch 'main' into npu

a0e6a8b

qgallouedec reviewed Feb 18, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec reviewed Feb 18, 2025

View reviewed changes

更新 grpo_trainer.py

44ca5cc

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

patch mem_get_info

82a36b4

ji-huazhong force-pushed the npu branch from 03ef32c to 82a36b4 Compare February 21, 2025 01:01

ji-huazhong mentioned this pull request Feb 21, 2025

AssertionError: Error in memory profiling vllm-project/vllm-ascend#109

Closed

dawnranger mentioned this pull request Feb 24, 2025

AssertionError: Error in memory profiling modelscope/ms-swift#3241

Closed

qgallouedec and others added 3 commits February 25, 2025 00:05

Merge branch 'main' into npu

60cae4e

Merge branch 'main' into npu

db74103

Merge branch 'main' into npu

1581109

stylre

dde817f

qgallouedec approved these changes Feb 27, 2025

View reviewed changes

qgallouedec changed the title ~~Add GRPO Trainer support for third-party accelerators~~ 🧗 Add GRPO Trainer support for third-party accelerators Feb 27, 2025

qgallouedec merged commit 27a6f22 into huggingface:main Feb 27, 2025
12 of 13 checks passed

ji-huazhong deleted the npu branch February 27, 2025 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧗 Add GRPO Trainer support for third-party accelerators #2836

🧗 Add GRPO Trainer support for third-party accelerators #2836

ji-huazhong commented Feb 12, 2025 •

edited

Loading

Superskyyy commented Feb 12, 2025

baymax591 commented Feb 14, 2025

ji-huazhong commented Feb 14, 2025

HuggingFaceDocBuilderDev commented Feb 14, 2025

qgallouedec commented Feb 14, 2025 •

edited

Loading

ji-huazhong commented Feb 18, 2025

lynnzhiyun commented Feb 18, 2025

qgallouedec Feb 18, 2025

ji-huazhong Feb 18, 2025 •

edited

Loading

ji-huazhong commented Feb 19, 2025 •

edited

Loading

ji-huazhong commented Feb 19, 2025

qgallouedec left a comment

ji-huazhong commented Feb 27, 2025

🧗 Add GRPO Trainer support for third-party accelerators #2836

🧗 Add GRPO Trainer support for third-party accelerators #2836

Conversation

ji-huazhong commented Feb 12, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

Superskyyy commented Feb 12, 2025

baymax591 commented Feb 14, 2025

ji-huazhong commented Feb 14, 2025

HuggingFaceDocBuilderDev commented Feb 14, 2025

qgallouedec commented Feb 14, 2025 • edited Loading

ji-huazhong commented Feb 18, 2025

lynnzhiyun commented Feb 18, 2025

qgallouedec Feb 18, 2025

Choose a reason for hiding this comment

ji-huazhong Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

ji-huazhong commented Feb 19, 2025 • edited Loading

ji-huazhong commented Feb 19, 2025

qgallouedec left a comment

Choose a reason for hiding this comment

ji-huazhong commented Feb 27, 2025

ji-huazhong commented Feb 12, 2025 •

edited

Loading

qgallouedec commented Feb 14, 2025 •

edited

Loading

ji-huazhong Feb 18, 2025 •

edited

Loading

ji-huazhong commented Feb 19, 2025 •

edited

Loading