adds support for override compute path #323

Manto · 2025-02-19T21:14:09Z

It looks like main_ppo is setup to take custom reward function via the compute_score function, but currently there's no way to override it from main().

This allows for passing in custom reward function via config, e.g.:

python3 -m verl.trainer.main_ppo \
    data.train_files=$DATA_DIR/train.parquet \
    data.val_files=$DATA_DIR/test.parquet \
    ...
    +compute_score_path=your_dataset.scoring_fn \

vermouth1992 · 2025-02-20T01:27:32Z

Nice feature! Could you add a test to protect this functionality? Otherwise, it's very easy to be broken.

uygnef · 2025-02-20T03:59:36Z

What if we add a separate model zoo path to make it easier for users to customize and share reward models/functions? This would decouple it from the Verl project source code, improving modularity and usability.

yu@bogon verl % tree verl
.
├── LICENSE
├── Notice.txt
├── README.md
├── docker
 ...
├── model_zoo
│   └── reward_model
│       └── openmath.py

For example, the main_task function could be updated as follows:

def main_task(config, compute_score=None):
   ...
    if config.reward_model.enable:
        if config.reward_model.strategy == 'fsdp':
            if config.reward_model.name == 'RewardModelWorker':
                from verl.workers.fsdp_workers import RewardModelWorker
            else:
                from verl.utils.import_utils import load_custom_models
                reward_module = load_custom_models('reward_model', config.reward_model.name)
                RewardModelWorker = reward_module.CustomRewardModelWorker

PeterSH6 · 2025-02-23T05:14:34Z

What if we add a separate model zoo path to make it easier for users to customize and share reward models/functions? This would decouple it from the Verl project source code, improving modularity and usability.

yu@bogon verl % tree verl
.
├── LICENSE
├── Notice.txt
├── README.md
├── docker
 ...
├── model_zoo
│   └── reward_model
│       └── openmath.py

For example, the main_task function could be updated as follows:

def main_task(config, compute_score=None):
   ...
    if config.reward_model.enable:
        if config.reward_model.strategy == 'fsdp':
            if config.reward_model.name == 'RewardModelWorker':
                from verl.workers.fsdp_workers import RewardModelWorker
            else:
                from verl.utils.import_utils import load_custom_models
                reward_module = load_custom_models('reward_model', config.reward_model.name)
                RewardModelWorker = reward_module.CustomRewardModelWorker

@uygnef Nice feature. It should be very useful to customize Reward Model. Would you like to submit a PR for this feature?

uygnef · 2025-02-24T09:08:34Z

@uygnef Nice feature. It should be very useful to customize Reward Model. Would you like to submit a PR for this feature?

Sure, I'll submit it later.

Manto added 2 commits February 19, 2025 13:13

adds support for override compute path

32b3d23

update comment

179ac98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds support for override compute path #323

adds support for override compute path #323

Manto commented Feb 19, 2025 •

edited

Loading

vermouth1992 commented Feb 20, 2025

uygnef commented Feb 20, 2025

PeterSH6 commented Feb 23, 2025

uygnef commented Feb 24, 2025

adds support for override compute path #323

Are you sure you want to change the base?

adds support for override compute path #323

Conversation

Manto commented Feb 19, 2025 • edited Loading

vermouth1992 commented Feb 20, 2025

uygnef commented Feb 20, 2025

PeterSH6 commented Feb 23, 2025

uygnef commented Feb 24, 2025

Manto commented Feb 19, 2025 •

edited

Loading