-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paligemma causal attention mask #30967
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you probably need <eos>
added to the labels if its being used the way that you're suggesting? Unless the idea is just to change that in all the tokenizer configs? or maybe just added to the example in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a great cleanup!
sorry to keep nitpicking while you're working on this but our timezones don't overlap so much so I want to communicate while we have the chance! I think this PR should add the |
No problem! I think we can add that eos at the end of the suffix, yes. |
Just ran my finetune script to completion and everything is looking correct!!! |
I just pushed the addition but nothing else! seems good to me :) |
tokenizer.add_special_tokens(tokens_to_add) | ||
tokenizer.add_tokens(EXTRA_TOKENS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add twice is slow but it's okay.
tokenizer.add_special_tokens(tokens_to_add) | |
tokenizer.add_tokens(EXTRA_TOKENS) | |
tokenizer.add_tokens([image_token]+EXTRA_TOKENS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but leave it as is is fine
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@probicheaux the tokenization issue should be fixed, and all the rest! Feel free to test ! |
Thanks a lot @probicheaux for your work! |
* PaliGemma working causal attention * Formatting * Style * Docstrings + remove commented code * Update docstring for PaliGemma Config * PaliGemma - add separator ind to model/labels * Refactor + docstring paligemma processor method * Style * return token type ids when tokenizing labels * use token type ids when building causal mask * add token type ids to tester * remove separator from config * fix style * don't ignore separator * add processor documentation * simplify tokenization * fix causal mask * style * fix label propagation, revert suffix naming * fix style * fix labels tokenization * [run-slow]paligemma * add eos if suffixes are present * [run-slow]paligemma * [run-slow]paligemma * add misssing tokens to fast version * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix style * [run-slow]paligemma --------- Co-authored-by: Peter Robicheaux <peter@roboflow.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Finetuned LoRA looks good |
@ArthurZucker I still have the bug using this code ` pred = vlm(pixel_values=tensor, input_ids=input_ids[:, :-1],attention_mask=torch.ones_like(input_ids[:, :-1])).logits pred = pred[:, -nb_tokens_answer:] loss = F.cross_entropy(pred.permute((0, 2, 1)), input_ids[:, -nb_tokens_answer:], reduction='mean') |
can you share the traceback as well? |
I am sure I use transformers 4.41.1, yet I the model doesnt seem causal to me. And the code again here copied and pasted : from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
from PIL import Image
import requests
model_id = "google/paligemma-3b-pt-224"
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)
prompt = "caption en"
model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# FIRST DEBUGGING METHOD ANALYZE INFLUENCE OF EXTRA TOKEN ON CAUSALITY
pred0 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
attention_mask=model_inputs['attention_mask'], ).logits
print((pred0, pred0[0, -2].argmax()))
pred1 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
attention_mask=model_inputs['attention_mask'], ).logits
print((pred1, pred1[0, -2].argmax())) # same before last token and logits
model_inputs['input_ids'][0,-1] *= 0 + 12 # modify last token
pred2 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
attention_mask=model_inputs['attention_mask'], ).logits
print((pred2, pred2[0, -2].argmax())) # different before last (penultimate) token and logits
# # SECOND DEBUGGING METHOD ANALYZE DECODING
# #THE TWO FIRST METHODS GIVE THE SAME RESULT
# input_len = model_inputs["input_ids"].shape[-1]
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# with torch.inference_mode():
# generation = model.generate(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
# max_new_tokens=1000, attention_mask=model_inputs['attention_mask'],
# do_sample=False)
# generation = generation[0][input_len:]
# decoded = processor.decode(generation, skip_special_tokens=False)
# print(decoded)
# #SECOND METHOD GIVING THE SAME RESULTS (manual greedy decoding)
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# eos_not_met = True
# while eos_not_met:
# # model_inputs['cache_position']
# # model_inputs['position_ids']
# with torch.inference_mode():
# pred = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
# attention_mask=model_inputs['attention_mask'], ).logits
# new_tok = torch.argmax(pred[0,-1])
# toks = torch.cat([model_inputs['input_ids'], new_tok[None][None]], dim=1)
# print(processor.decode(toks[0], skip_special_tokens=False))
# eos_not_met = new_tok.item() != processor.tokenizer.eos_token_id
# model_inputs['input_ids'] = toks
# model_inputs['attention_mask'] = torch.ones_like(model_inputs['input_ids'])
# #THIRD METHOD NOT GIVING THE RIGHT RESULT (manual greedy decoding with extra token in the end)
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# eos_not_met = True
# while eos_not_met:
# # model_inputs['cache_position']
# # model_inputs['position_ids']
# toks = torch.cat([model_inputs['input_ids'], new_tok[None][None]], dim=1) # add an extra token to interfere
# with torch.inference_mode():
# pred = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
# attention_mask=model_inputs['attention_mask'], ).logits
# new_tok = torch.argmax(pred[0,-2])
# toks[0, -1] = new_tok # replacing the interefence token with actual next token
# print(processor.decode(toks[0], skip_special_tokens=False))
# eos_not_met = new_tok.item() != processor.tokenizer.eos_token_id
# model_inputs['input_ids'] = toks
# model_inputs['attention_mask'] = torch.ones_like(model_inputs['input_ids'])
|
Hey @edmondja, thanks for your interest in this! Can you open a separate Issue with your reproducer, including expected outputs and obtained ones and ping me on it? I can't find an exact reference to a bug in your current description. |
Thank you for your help. I thought it was still the same issue, I will create a new issue. |
Continuation of #30918
cc @ArthurZucker !