Add Remote Reward Server Feature #329

YuchenFan48 · 2025-02-20T12:08:06Z

Description

This PR introduces support for remote generative models to enhance the verification and reward assignment process in VERL. Previously, VERL only supported rule-based rewards for verification. With this update, we enable more flexible and dynamic reward mechanisms using generative models.

Key Changes

Bug Fix:
- Resolved an issue in main_ppo.py where the val_dataloader was incorrectly processed in batches. This ensures proper handling of validation data during training.
New Features:
- Added a Generative Reward Manager in workers/reward_manager to handle reward generation using remote generative models.
- Implemented a corresponding compute_score function in utils/reward_score to calculate scores based on generative model outputs.
Issue Resolution:
- Closes Issue #269 and Issue #229: Adds support for remote generative reward mechanisms.

examples/ppo_trainer/run_ppo.sh

vermouth1992 · 2025-02-22T01:54:11Z

examples/sft/gsm8k/run_qwen_05_peft.sh

Why do we have an empty file here?

I have resolved the empty file problem

vermouth1992 · 2025-02-22T01:54:25Z

examples/trainer.experiment_name=

Could you remove this file?

File removed

vermouth1992 · 2025-02-22T01:55:08Z

run_ppo.sh

Remove this file as well

Sure, file removed already

PeterSH6 · 2025-02-24T11:53:47Z

verl/utils/reward_score/generative.py

@@ -0,0 +1,163 @@
+import re


Please add a license similar to other files. Thanks!

Licence Added.

PeterSH6 · 2025-02-24T11:54:07Z

verl/workers/reward_manager/generative.py

@@ -0,0 +1,89 @@
+import asyncio


Also add a license similar to other files

License added.

CLAassistant · 2025-02-26T00:32:27Z

All committers have signed the CLA.

YuchenFan48 · 2025-02-26T02:04:13Z

CLA signed :)

YuchenFan48 added 7 commits February 20, 2025 18:35

Remote Gen Reward

a46876a

Remote Gen Reward

c351370

Remote Gen Reward

8cbe8a2

Remote Gen Reward

9c2e861

Update

ebbfe37

Update

44a6cec

Update

148a38a

vermouth1992 reviewed Feb 20, 2025

View reviewed changes

examples/ppo_trainer/run_ppo.sh Outdated Show resolved Hide resolved

Update run_ppo.sh

56a13a8

vermouth1992 reviewed Feb 22, 2025

View reviewed changes

YuchenFan48 added 2 commits February 22, 2025 15:01

Update

41ba36b

Update

a1c9b55

PeterSH6 requested changes Feb 24, 2025

View reviewed changes

Add Licences

06acafd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Remote Reward Server Feature #329

Add Remote Reward Server Feature #329

YuchenFan48 commented Feb 20, 2025

vermouth1992 Feb 22, 2025

YuchenFan48 Feb 22, 2025

vermouth1992 Feb 22, 2025

YuchenFan48 Feb 22, 2025

vermouth1992 Feb 22, 2025

YuchenFan48 Feb 22, 2025

PeterSH6 Feb 24, 2025

YuchenFan48 Feb 24, 2025

PeterSH6 Feb 24, 2025

YuchenFan48 Feb 24, 2025

CLAassistant commented Feb 26, 2025 •

edited

Loading

YuchenFan48 commented Feb 26, 2025

Add Remote Reward Server Feature #329

Are you sure you want to change the base?

Add Remote Reward Server Feature #329

Conversation

YuchenFan48 commented Feb 20, 2025

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Feb 26, 2025 • edited Loading

YuchenFan48 commented Feb 26, 2025

CLAassistant commented Feb 26, 2025 •

edited

Loading