[ChatLLaMA] RLHF Training: dimension mismatch #312

BigRoddy · 2023-03-29T10:47:07Z

I am getting the following error when doing RLHF training:
Traceback (most recent call last):
File "/code/main.py", in
rlhf_trainer.train()
File "/code/trainer.py", in train
self.learn(memories)
File "/code/trainer.py", in learn
surr1 = advantages * ratios
RuntimeError: The size of tensor a (29) must match the size of tensor b (38) at non-singleton dimension 1

And I output some shape of the tensors:
rewards shape: torch.Size([1, 29])
old_values shape: torch.Size([1, 29])
actions_logits shape: torch.Size([1, 38, 50272])
old_actions_log_probs shape: torch.Size([1, 38])
ratios shape: torch.Size([1, 38])
advantages shape: torch.Size([1, 29])

This seems to be due to the fact that my actor and critic use different family models (opt-125m and gpt2)?

BigRoddy · 2023-03-30T07:09:05Z

https://github.com/nebuly-ai/nebullvm/blob/f8796c25aa6b5428a16c4929cdcfe7ea9f5b3f27/apps/accelerate/chatllama/chatllama/rlhf/trainer.py#L323
https://github.com/nebuly-ai/nebullvm/blob/f8796c25aa6b5428a16c4929cdcfe7ea9f5b3f27/apps/accelerate/chatllama/chatllama/rlhf/trainer.py#L348

It looks like these two pieces of code are miscalculated. What it wants to calculate here is the length of the actions under different actor and critic tokenizer encoding conditions, but the actor actually calculates the length of the actions and assigns the value to the action_len_actor, but the critic calculates the length of the inputs states_critic and assigns to the action_len_critic, resulting in a mismatch in the dimensions of logits and values.

PierpaoloSorbellini · 2023-03-30T08:04:04Z

Hi @BigRoddy thanks for reaching out!
Yep I think that this could be a bug in the code, we haven't fully tested the training under different tokenizers even if we have started to implement it.
Thanks for the issue and the detailed analysis, if you want to contribute to the repo feel free to open a PR with the adjustment!

PierpaoloSorbellini · 2023-04-03T15:31:39Z

Hi @BigRoddy
This Should fix your issue if I stand correctly:
You can find all the source code in the PR #306

if self.use_same_tokenizer:
    sequences_critic = sequences_actor
    sequences_mask_critic = sequences_mask_actor
    action_len_critic = action_len_actor
else:
    encoded_critic = change_tokenization(
        sequences_actor,
        self.actor.tokenizer,
        self.critic.tokenizer,
    )
    # split the encoded_critic in tokens and maks
    sequences_critic = encoded_critic["input_ids"].to(
        sequences_actor.device,
    )
    sequences_mask_critic = (
        encoded_critic["attention_mask"]
        .to(sequences_actor.device)
        .long()
        .detach()
    )

    # compute len of actions for the critic tokenizer
    action_len_critic = states_critic.shape[1]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ChatLLaMA] RLHF Training: dimension mismatch #312

[ChatLLaMA] RLHF Training: dimension mismatch #312

BigRoddy commented Mar 29, 2023

BigRoddy commented Mar 30, 2023 •

edited

Loading

PierpaoloSorbellini commented Mar 30, 2023

PierpaoloSorbellini commented Apr 3, 2023 •

edited

Loading

[ChatLLaMA] RLHF Training: dimension mismatch #312

[ChatLLaMA] RLHF Training: dimension mismatch #312

Comments

BigRoddy commented Mar 29, 2023

BigRoddy commented Mar 30, 2023 • edited Loading

PierpaoloSorbellini commented Mar 30, 2023

PierpaoloSorbellini commented Apr 3, 2023 • edited Loading

BigRoddy commented Mar 30, 2023 •

edited

Loading

PierpaoloSorbellini commented Apr 3, 2023 •

edited

Loading