Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds support for override compute path #323

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Manto
Copy link

@Manto Manto commented Feb 19, 2025

It looks like main_ppo is setup to take custom reward function via the compute_score function, but currently there's no way to override it from main().

This allows for passing in custom reward function via config, e.g.:

python3 -m verl.trainer.main_ppo \
    data.train_files=$DATA_DIR/train.parquet \
    data.val_files=$DATA_DIR/test.parquet \
    ...
    +compute_score_path=your_dataset.scoring_fn \

@vermouth1992
Copy link
Collaborator

Nice feature! Could you add a test to protect this functionality? Otherwise, it's very easy to be broken.

@uygnef
Copy link
Contributor

uygnef commented Feb 20, 2025

What if we add a separate model zoo path to make it easier for users to customize and share reward models/functions? This would decouple it from the Verl project source code, improving modularity and usability.

yu@bogon verl % tree verl
.
├── LICENSE
├── Notice.txt
├── README.md
├── docker
 ...
├── model_zoo
│   └── reward_model
│       └── openmath.py

For example, the main_task function could be updated as follows:

def main_task(config, compute_score=None):
   ...
    if config.reward_model.enable:
        if config.reward_model.strategy == 'fsdp':
            if config.reward_model.name == 'RewardModelWorker':
                from verl.workers.fsdp_workers import RewardModelWorker
            else:
                from verl.utils.import_utils import load_custom_models
                reward_module = load_custom_models('reward_model', config.reward_model.name)
                RewardModelWorker = reward_module.CustomRewardModelWorker

@PeterSH6
Copy link
Collaborator

What if we add a separate model zoo path to make it easier for users to customize and share reward models/functions? This would decouple it from the Verl project source code, improving modularity and usability.

yu@bogon verl % tree verl
.
├── LICENSE
├── Notice.txt
├── README.md
├── docker
 ...
├── model_zoo
│   └── reward_model
│       └── openmath.py

For example, the main_task function could be updated as follows:

def main_task(config, compute_score=None):
   ...
    if config.reward_model.enable:
        if config.reward_model.strategy == 'fsdp':
            if config.reward_model.name == 'RewardModelWorker':
                from verl.workers.fsdp_workers import RewardModelWorker
            else:
                from verl.utils.import_utils import load_custom_models
                reward_module = load_custom_models('reward_model', config.reward_model.name)
                RewardModelWorker = reward_module.CustomRewardModelWorker

@uygnef Nice feature. It should be very useful to customize Reward Model. Would you like to submit a PR for this feature?

@uygnef
Copy link
Contributor

uygnef commented Feb 24, 2025

@uygnef Nice feature. It should be very useful to customize Reward Model. Would you like to submit a PR for this feature?

Sure, I'll submit it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants