Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ChatLLaMA] RLHF Training: dimension mismatch #312

Open
BigRoddy opened this issue Mar 29, 2023 · 3 comments
Open

[ChatLLaMA] RLHF Training: dimension mismatch #312

BigRoddy opened this issue Mar 29, 2023 · 3 comments

Comments

@BigRoddy
Copy link

I am getting the following error when doing RLHF training:
Traceback (most recent call last):
File "/code/main.py", in
rlhf_trainer.train()
File "/code/trainer.py", in train
self.learn(memories)
File "/code/trainer.py", in learn
surr1 = advantages * ratios
RuntimeError: The size of tensor a (29) must match the size of tensor b (38) at non-singleton dimension 1

And I output some shape of the tensors:
rewards shape: torch.Size([1, 29])
old_values shape: torch.Size([1, 29])
actions_logits shape: torch.Size([1, 38, 50272])
old_actions_log_probs shape: torch.Size([1, 38])
ratios shape: torch.Size([1, 38])
advantages shape: torch.Size([1, 29])

This seems to be due to the fact that my actor and critic use different family models (opt-125m and gpt2)?

@BigRoddy
Copy link
Author

BigRoddy commented Mar 30, 2023

https://github.com/nebuly-ai/nebullvm/blob/f8796c25aa6b5428a16c4929cdcfe7ea9f5b3f27/apps/accelerate/chatllama/chatllama/rlhf/trainer.py#L323
https://github.com/nebuly-ai/nebullvm/blob/f8796c25aa6b5428a16c4929cdcfe7ea9f5b3f27/apps/accelerate/chatllama/chatllama/rlhf/trainer.py#L348

It looks like these two pieces of code are miscalculated. What it wants to calculate here is the length of the actions under different actor and critic tokenizer encoding conditions, but the actor actually calculates the length of the actions and assigns the value to the action_len_actor, but the critic calculates the length of the inputs states_critic and assigns to the action_len_critic, resulting in a mismatch in the dimensions of logits and values.

@PierpaoloSorbellini
Copy link
Collaborator

Hi @BigRoddy thanks for reaching out!
Yep I think that this could be a bug in the code, we haven't fully tested the training under different tokenizers even if we have started to implement it.
Thanks for the issue and the detailed analysis, if you want to contribute to the repo feel free to open a PR with the adjustment!

@PierpaoloSorbellini
Copy link
Collaborator

PierpaoloSorbellini commented Apr 3, 2023

Hi @BigRoddy
This Should fix your issue if I stand correctly:
You can find all the source code in the PR #306

if self.use_same_tokenizer:
    sequences_critic = sequences_actor
    sequences_mask_critic = sequences_mask_actor
    action_len_critic = action_len_actor
else:
    encoded_critic = change_tokenization(
        sequences_actor,
        self.actor.tokenizer,
        self.critic.tokenizer,
    )
    # split the encoded_critic in tokens and maks
    sequences_critic = encoded_critic["input_ids"].to(
        sequences_actor.device,
    )
    sequences_mask_critic = (
        encoded_critic["attention_mask"]
        .to(sequences_actor.device)
        .long()
        .detach()
    )

    # compute len of actions for the critic tokenizer
    action_len_critic = states_critic.shape[1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants