fix: paligemma not return loss when not use suffix #30992

MrToy · 2024-05-23T17:02:44Z

What does this PR do?

According to #30967
forgot to consider the case of not using suffix , training paligemma will get an error

ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask,pixel_values,labels.

cc @ArthurZucker @molbap

ArthurZucker · 2024-05-23T17:09:09Z

But you are not supposed to train without a suffix, which is why I am not sure I understand the issue.
prefix: + promt
suffix: something.
If you want to train on the prompt then suffix = prompt.

MrToy · 2024-05-23T17:20:05Z

My scenario is a conversation via image, so finetune it based on other vllm methods, use tokenizer.apply_chat_template to generate autoregressive prompt rather than prefix + suffix. I think suffix should be optional?

ArthurZucker · 2024-05-23T19:12:16Z

No, because you specifically need the suffix. That's just how the attention mask needs to be created for the model.

ArthurZucker · 2024-05-23T19:14:08Z

I'll let @molbap sort this out, but TLDR, you gotta make sure the attention on the text you want to train on is causal, vs the image where it's full attention and the prompt (not the suffix) should also be bidirectional. Not sure training will work well if you don't follow this

ArthurZucker · 2024-05-23T19:19:10Z

@MrToy hope we were helpful! 🤗

MrToy · 2024-05-23T19:36:00Z

Thank you，I read the paligemma's design again， causal attn indeed only exists in suffix， I will look for some trick method to format the messages, e.g. prompt="" suffix=formatted_messages

Although incorrect, but this changes is work well for my training. The model works even without unmasking the prefix part🤣

ArthurZucker · 2024-05-23T19:39:16Z

Awesome! Hope you can share your findings with the community as well! 🚀

molbap · 2024-05-24T07:50:39Z

That's great @MrToy ! If you need to pass a custom attention mask, you may also form the inputs by yourself entirely by passing a

# assuming you have 
inputs = processor(text=formatted_messages, images=..., return_tensors="pt", padding="longest") 
# this won't set token_type_ids since you don't pass a suffix but you can do
inputs['token_type_ids'] = torch.ones_like(your_input_ids_tensor)

which will set all tokens that you send to be of type "1", that is, of type suffix. Then

outputs = model(**inputs)

should also work.

fix: no loss when not use suffix

38e5e75

MrToy closed this May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: paligemma not return loss when not use suffix #30992

fix: paligemma not return loss when not use suffix #30992

MrToy commented May 23, 2024

ArthurZucker commented May 23, 2024

MrToy commented May 23, 2024 •

edited

Loading

ArthurZucker commented May 23, 2024

ArthurZucker commented May 23, 2024

ArthurZucker commented May 23, 2024

MrToy commented May 23, 2024 •

edited

Loading

ArthurZucker commented May 23, 2024

molbap commented May 24, 2024 •

edited

Loading

fix: paligemma not return loss when not use suffix #30992

fix: paligemma not return loss when not use suffix #30992

Conversation

MrToy commented May 23, 2024

What does this PR do?

ArthurZucker commented May 23, 2024

MrToy commented May 23, 2024 • edited Loading

ArthurZucker commented May 23, 2024

ArthurZucker commented May 23, 2024

ArthurZucker commented May 23, 2024

MrToy commented May 23, 2024 • edited Loading

ArthurZucker commented May 23, 2024

molbap commented May 24, 2024 • edited Loading

MrToy commented May 23, 2024 •

edited

Loading

MrToy commented May 23, 2024 •

edited

Loading

molbap commented May 24, 2024 •

edited

Loading