-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ChatLLaMA] RLHF Training: dimension mismatch #312
Comments
https://github.com/nebuly-ai/nebullvm/blob/f8796c25aa6b5428a16c4929cdcfe7ea9f5b3f27/apps/accelerate/chatllama/chatllama/rlhf/trainer.py#L323 It looks like these two pieces of code are miscalculated. What it wants to calculate here is the length of the actions under different actor and critic tokenizer encoding conditions, but the actor actually calculates the length of the actions and assigns the value to the action_len_actor, but the critic calculates the length of the inputs states_critic and assigns to the action_len_critic, resulting in a mismatch in the dimensions of logits and values. |
Hi @BigRoddy thanks for reaching out! |
Hi @BigRoddy
|
I am getting the following error when doing RLHF training:
Traceback (most recent call last):
File "/code/main.py", in
rlhf_trainer.train()
File "/code/trainer.py", in train
self.learn(memories)
File "/code/trainer.py", in learn
surr1 = advantages * ratios
RuntimeError: The size of tensor a (29) must match the size of tensor b (38) at non-singleton dimension 1
And I output some shape of the tensors:
rewards shape: torch.Size([1, 29])
old_values shape: torch.Size([1, 29])
actions_logits shape: torch.Size([1, 38, 50272])
old_actions_log_probs shape: torch.Size([1, 38])
ratios shape: torch.Size([1, 38])
advantages shape: torch.Size([1, 29])
This seems to be due to the fact that my actor and critic use different family models (opt-125m and gpt2)?
The text was updated successfully, but these errors were encountered: