Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: paligemma not return loss when not use suffix #30992

Closed
wants to merge 1 commit into from

Conversation

MrToy
Copy link

@MrToy MrToy commented May 23, 2024

What does this PR do?

According to #30967
forgot to consider the case of not using suffix , training paligemma will get an error

ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask,pixel_values,labels.

cc @ArthurZucker @molbap

@ArthurZucker
Copy link
Collaborator

But you are not supposed to train without a suffix, which is why I am not sure I understand the issue.
prefix: + promt
suffix: something.
If you want to train on the prompt then suffix = prompt.

@MrToy
Copy link
Author

MrToy commented May 23, 2024

My scenario is a conversation via image, so finetune it based on other vllm methods, use tokenizer.apply_chat_template to generate autoregressive prompt rather than prefix + suffix. I think suffix should be optional?

@ArthurZucker
Copy link
Collaborator

No, because you specifically need the suffix. That's just how the attention mask needs to be created for the model.

@ArthurZucker
Copy link
Collaborator

I'll let @molbap sort this out, but TLDR, you gotta make sure the attention on the text you want to train on is causal, vs the image where it's full attention and the prompt (not the suffix) should also be bidirectional. Not sure training will work well if you don't follow this

@MrToy MrToy closed this May 23, 2024
@ArthurZucker
Copy link
Collaborator

@MrToy hope we were helpful! 🤗

@MrToy
Copy link
Author

MrToy commented May 23, 2024

Thank you,I read the paligemma's design again, causal attn indeed only exists in suffix, I will look for some trick method to format the messages, e.g. prompt="" suffix=formatted_messages

Although incorrect, but this changes is work well for my training. The model works even without unmasking the prefix part🤣

@ArthurZucker
Copy link
Collaborator

Awesome! Hope you can share your findings with the community as well! 🚀

@molbap
Copy link
Contributor

molbap commented May 24, 2024

That's great @MrToy ! If you need to pass a custom attention mask, you may also form the inputs by yourself entirely by passing a

# assuming you have 
inputs = processor(text=formatted_messages, images=..., return_tensors="pt", padding="longest") 
# this won't set token_type_ids since you don't pass a suffix but you can do
inputs['token_type_ids'] = torch.ones_like(your_input_ids_tensor)

which will set all tokens that you send to be of type "1", that is, of type suffix. Then

outputs = model(**inputs)

should also work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants