Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paligemma causal attention mask #30967

Merged
merged 31 commits into from
May 22, 2024
Merged

Paligemma causal attention mask #30967

merged 31 commits into from
May 22, 2024

Conversation

molbap
Copy link
Contributor

@molbap molbap commented May 22, 2024

Continuation of #30918

cc @ArthurZucker !

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@probicheaux probicheaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you probably need <eos> added to the labels if its being used the way that you're suggesting? Unless the idea is just to change that in all the tokenizer configs? or maybe just added to the example in the docstring?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a great cleanup!

@probicheaux
Copy link
Contributor

probicheaux commented May 22, 2024

sorry to keep nitpicking while you're working on this but our timezones don't overlap so much so I want to communicate while we have the chance! I think this PR should add the <eos> token directly to the end of the suffix just like it adds <bos> in the correct spot

@molbap
Copy link
Contributor Author

molbap commented May 22, 2024

No problem! I think we can add that eos at the end of the suffix, yes.

@probicheaux
Copy link
Contributor

Just ran my finetune script to completion and everything is looking correct!!!

@molbap
Copy link
Contributor Author

molbap commented May 22, 2024

I just pushed the addition but nothing else! seems good to me :)

Comment on lines 110 to +111
tokenizer.add_special_tokens(tokens_to_add)
tokenizer.add_tokens(EXTRA_TOKENS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add twice is slow but it's okay.

Suggested change
tokenizer.add_special_tokens(tokens_to_add)
tokenizer.add_tokens(EXTRA_TOKENS)
tokenizer.add_tokens([image_token]+EXTRA_TOKENS)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but leave it as is is fine

molbap and others added 3 commits May 22, 2024 19:29
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@ArthurZucker
Copy link
Collaborator

@probicheaux the tokenization issue should be fixed, and all the rest! Feel free to test !

@molbap molbap merged commit a25f7d3 into main May 22, 2024
23 checks passed
@molbap molbap deleted the paligemma-causal-attention-mask branch May 22, 2024 17:37
@molbap
Copy link
Contributor Author

molbap commented May 22, 2024

Thanks a lot @probicheaux for your work!

ArthurZucker added a commit that referenced this pull request May 22, 2024
* PaliGemma working causal attention

* Formatting

* Style

* Docstrings + remove commented code

* Update docstring for PaliGemma Config

* PaliGemma - add separator ind to model/labels

* Refactor + docstring paligemma processor method

* Style

* return token type ids when tokenizing labels

* use token type ids when building causal mask

* add token type ids to tester

* remove separator from config

* fix style

* don't ignore separator

* add processor documentation

* simplify tokenization

* fix causal mask

* style

* fix label propagation, revert suffix naming

* fix style

* fix labels tokenization

* [run-slow]paligemma

* add eos if suffixes are present

* [run-slow]paligemma

* [run-slow]paligemma

* add misssing tokens to fast version

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix style

* [run-slow]paligemma

---------

Co-authored-by: Peter Robicheaux <peter@roboflow.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@probicheaux
Copy link
Contributor

Finetuned LoRA looks good

@edmondja
Copy link

edmondja commented May 23, 2024

@ArthurZucker I still have the bug using this code

`
vlm = PaliGemmaForConditionalGeneration.from_pretrained(**llm_args)

pred = vlm(pixel_values=tensor, input_ids=input_ids[:, :-1],attention_mask=torch.ones_like(input_ids[:, :-1])).logits

pred = pred[:, -nb_tokens_answer:]

loss = F.cross_entropy(pred.permute((0, 2, 1)), input_ids[:, -nb_tokens_answer:], reduction='mean')
`

@ArthurZucker
Copy link
Collaborator

can you share the traceback as well?

@edmondja
Copy link

edmondja commented May 23, 2024

can you share the traceback as well?

I am sure I use transformers 4.41.1, yet I the model doesnt seem causal to me.
I am adding the zipped code and console results
here https://filedn.eu/lB85dAuefYtHyPANtFCdHfB/arthur_debug_attempt.zip
and here arthur_debug_attempt.zip

And the code again here copied and pasted :

from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
from PIL import Image
import requests

model_id = "google/paligemma-3b-pt-224"

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

model = PaliGemmaForConditionalGeneration.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)

prompt = "caption en"
model_inputs = processor(text=prompt, images=image, return_tensors="pt")

# FIRST DEBUGGING METHOD ANALYZE INFLUENCE OF EXTRA TOKEN ON CAUSALITY
pred0 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
             attention_mask=model_inputs['attention_mask'], ).logits
print((pred0, pred0[0, -2].argmax()))
pred1 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
             attention_mask=model_inputs['attention_mask'], ).logits
print((pred1, pred1[0, -2].argmax())) # same before last token and logits
model_inputs['input_ids'][0,-1] *= 0 + 12 # modify last token
pred2 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
             attention_mask=model_inputs['attention_mask'], ).logits
print((pred2, pred2[0, -2].argmax())) # different before last (penultimate) token and logits



# # SECOND DEBUGGING METHOD ANALYZE DECODING
# #THE TWO FIRST METHODS GIVE THE SAME RESULT
# input_len = model_inputs["input_ids"].shape[-1]
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# with torch.inference_mode():
#     generation = model.generate(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'], 
#                                 max_new_tokens=1000, attention_mask=model_inputs['attention_mask'], 
#                                 do_sample=False)
#     generation = generation[0][input_len:]
#     decoded = processor.decode(generation, skip_special_tokens=False)
#     print(decoded)

# #SECOND METHOD GIVING THE SAME RESULTS (manual greedy decoding)
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# eos_not_met = True
# while eos_not_met:
#     # model_inputs['cache_position']
#     # model_inputs['position_ids']
#     with torch.inference_mode():
#         pred = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
#                       attention_mask=model_inputs['attention_mask'], ).logits
#     new_tok = torch.argmax(pred[0,-1])
    
#     toks = torch.cat([model_inputs['input_ids'], new_tok[None][None]], dim=1)
#     print(processor.decode(toks[0], skip_special_tokens=False))
    
#     eos_not_met = new_tok.item() != processor.tokenizer.eos_token_id
#     model_inputs['input_ids'] = toks
#     model_inputs['attention_mask'] = torch.ones_like(model_inputs['input_ids']) 

# #THIRD METHOD NOT GIVING THE RIGHT RESULT (manual greedy decoding with extra token in the end)
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# eos_not_met = True
# while eos_not_met:
#     # model_inputs['cache_position']
#     # model_inputs['position_ids']
#     toks = torch.cat([model_inputs['input_ids'], new_tok[None][None]], dim=1) # add an extra token to interfere
#     with torch.inference_mode():
#         pred = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
#                       attention_mask=model_inputs['attention_mask'], ).logits
#     new_tok = torch.argmax(pred[0,-2])
#     toks[0, -1] = new_tok # replacing the interefence token with actual next token
    
#     print(processor.decode(toks[0], skip_special_tokens=False))
    
#     eos_not_met = new_tok.item() != processor.tokenizer.eos_token_id
#     model_inputs['input_ids'] = toks
#     model_inputs['attention_mask'] = torch.ones_like(model_inputs['input_ids']) 

  

@molbap
Copy link
Contributor Author

molbap commented May 23, 2024

Hey @edmondja, thanks for your interest in this! Can you open a separate Issue with your reproducer, including expected outputs and obtained ones and ping me on it? I can't find an exact reference to a bug in your current description.

@edmondja
Copy link

Thank you for your help. I thought it was still the same issue, I will create a new issue.
@molbap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants