Paligemma causal attention mask #30967

molbap · 2024-05-22T13:28:43Z

Continuation of #30918

HuggingFaceDocBuilderDev · 2024-05-22T14:21:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

probicheaux

looks like you probably need <eos> added to the labels if its being used the way that you're suggesting? Unless the idea is just to change that in all the tokenizer configs? or maybe just added to the example in the docstring?

ArthurZucker

this looks like a great cleanup!

src/transformers/models/paligemma/processing_paligemma.py

src/transformers/models/paligemma/modeling_paligemma.py

probicheaux · 2024-05-22T16:48:56Z

sorry to keep nitpicking while you're working on this but our timezones don't overlap so much so I want to communicate while we have the chance! I think this PR should add the <eos> token directly to the end of the suffix just like it adds <bos> in the correct spot

molbap · 2024-05-22T16:54:12Z

No problem! I think we can add that eos at the end of the suffix, yes.

probicheaux · 2024-05-22T17:00:06Z

Just ran my finetune script to completion and everything is looking correct!!!

molbap · 2024-05-22T17:03:20Z

I just pushed the addition but nothing else! seems good to me :)

src/transformers/models/paligemma/modeling_paligemma.py

ArthurZucker · 2024-05-22T17:23:48Z

src/transformers/models/paligemma/processing_paligemma.py

        tokenizer.add_special_tokens(tokens_to_add)
+        tokenizer.add_tokens(EXTRA_TOKENS)


add twice is slow but it's okay.

Suggested change

tokenizer.add_special_tokens(tokens_to_add)

tokenizer.add_tokens(EXTRA_TOKENS)

tokenizer.add_tokens([image_token]+EXTRA_TOKENS)

but leave it as is is fine

src/transformers/models/paligemma/processing_paligemma.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker · 2024-05-22T17:32:32Z

@probicheaux the tokenization issue should be fixed, and all the rest! Feel free to test !

molbap · 2024-05-22T17:39:16Z

Thanks a lot @probicheaux for your work!

* PaliGemma working causal attention * Formatting * Style * Docstrings + remove commented code * Update docstring for PaliGemma Config * PaliGemma - add separator ind to model/labels * Refactor + docstring paligemma processor method * Style * return token type ids when tokenizing labels * use token type ids when building causal mask * add token type ids to tester * remove separator from config * fix style * don't ignore separator * add processor documentation * simplify tokenization * fix causal mask * style * fix label propagation, revert suffix naming * fix style * fix labels tokenization * [run-slow]paligemma * add eos if suffixes are present * [run-slow]paligemma * [run-slow]paligemma * add misssing tokens to fast version * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix style * [run-slow]paligemma --------- Co-authored-by: Peter Robicheaux <peter@roboflow.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

probicheaux · 2024-05-22T17:40:48Z

Finetuned LoRA looks good

edmondja · 2024-05-23T08:36:44Z

@ArthurZucker I still have the bug using this code

`
vlm = PaliGemmaForConditionalGeneration.from_pretrained(**llm_args)

pred = vlm(pixel_values=tensor, input_ids=input_ids[:, :-1],attention_mask=torch.ones_like(input_ids[:, :-1])).logits

pred = pred[:, -nb_tokens_answer:]

loss = F.cross_entropy(pred.permute((0, 2, 1)), input_ids[:, -nb_tokens_answer:], reduction='mean')
`

ArthurZucker · 2024-05-23T08:59:11Z

can you share the traceback as well?

edmondja · 2024-05-23T15:28:42Z

can you share the traceback as well?

I am sure I use transformers 4.41.1, yet I the model doesnt seem causal to me.
I am adding the zipped code and console results
here https://filedn.eu/lB85dAuefYtHyPANtFCdHfB/arthur_debug_attempt.zip
and here arthur_debug_attempt.zip

And the code again here copied and pasted :

from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
from PIL import Image
import requests

model_id = "google/paligemma-3b-pt-224"

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

model = PaliGemmaForConditionalGeneration.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)

prompt = "caption en"
model_inputs = processor(text=prompt, images=image, return_tensors="pt")

# FIRST DEBUGGING METHOD ANALYZE INFLUENCE OF EXTRA TOKEN ON CAUSALITY
pred0 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
             attention_mask=model_inputs['attention_mask'], ).logits
print((pred0, pred0[0, -2].argmax()))
pred1 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
             attention_mask=model_inputs['attention_mask'], ).logits
print((pred1, pred1[0, -2].argmax())) # same before last token and logits
model_inputs['input_ids'][0,-1] *= 0 + 12 # modify last token
pred2 = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
             attention_mask=model_inputs['attention_mask'], ).logits
print((pred2, pred2[0, -2].argmax())) # different before last (penultimate) token and logits



# # SECOND DEBUGGING METHOD ANALYZE DECODING
# #THE TWO FIRST METHODS GIVE THE SAME RESULT
# input_len = model_inputs["input_ids"].shape[-1]
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# with torch.inference_mode():
#     generation = model.generate(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'], 
#                                 max_new_tokens=1000, attention_mask=model_inputs['attention_mask'], 
#                                 do_sample=False)
#     generation = generation[0][input_len:]
#     decoded = processor.decode(generation, skip_special_tokens=False)
#     print(decoded)

# #SECOND METHOD GIVING THE SAME RESULTS (manual greedy decoding)
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# eos_not_met = True
# while eos_not_met:
#     # model_inputs['cache_position']
#     # model_inputs['position_ids']
#     with torch.inference_mode():
#         pred = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
#                       attention_mask=model_inputs['attention_mask'], ).logits
#     new_tok = torch.argmax(pred[0,-1])
    
#     toks = torch.cat([model_inputs['input_ids'], new_tok[None][None]], dim=1)
#     print(processor.decode(toks[0], skip_special_tokens=False))
    
#     eos_not_met = new_tok.item() != processor.tokenizer.eos_token_id
#     model_inputs['input_ids'] = toks
#     model_inputs['attention_mask'] = torch.ones_like(model_inputs['input_ids']) 

# #THIRD METHOD NOT GIVING THE RIGHT RESULT (manual greedy decoding with extra token in the end)
# model_inputs = processor(text=prompt, images=image, return_tensors="pt")
# eos_not_met = True
# while eos_not_met:
#     # model_inputs['cache_position']
#     # model_inputs['position_ids']
#     toks = torch.cat([model_inputs['input_ids'], new_tok[None][None]], dim=1) # add an extra token to interfere
#     with torch.inference_mode():
#         pred = model(input_ids=model_inputs['input_ids'], pixel_values=model_inputs['pixel_values'],
#                       attention_mask=model_inputs['attention_mask'], ).logits
#     new_tok = torch.argmax(pred[0,-2])
#     toks[0, -1] = new_tok # replacing the interefence token with actual next token
    
#     print(processor.decode(toks[0], skip_special_tokens=False))
    
#     eos_not_met = new_tok.item() != processor.tokenizer.eos_token_id
#     model_inputs['input_ids'] = toks
#     model_inputs['attention_mask'] = torch.ones_like(model_inputs['input_ids'])

molbap · 2024-05-23T15:47:07Z

Hey @edmondja, thanks for your interest in this! Can you open a separate Issue with your reproducer, including expected outputs and obtained ones and ping me on it? I can't find an exact reference to a bug in your current description.

edmondja · 2024-05-23T17:16:01Z

Thank you for your help. I thought it was still the same issue, I will create a new issue.
@molbap

probicheaux and others added 18 commits May 20, 2024 15:07

PaliGemma working causal attention

9ae0773

Formatting

0424fd6

Style

7c7b43b

Docstrings + remove commented code

ca5dcb5

Update docstring for PaliGemma Config

28ac09c

PaliGemma - add separator ind to model/labels

52ba9ad

Refactor + docstring paligemma processor method

9a732f4

Style

91d6499

return token type ids when tokenizing labels

aeaf862

use token type ids when building causal mask

41fe49f

add token type ids to tester

047510c

remove separator from config

97aac21

fix style

25df1d6

Merge branch 'main' into paligemma-causal-attention-mask

ff93a2d

don't ignore separator

4430ade

add processor documentation

0c20448

simplify tokenization

843a8de

Merge branch 'main' into paligemma-causal-attention-mask

fc94bae

molbap added 2 commits May 22, 2024 16:44

fix causal mask

6cfc20a

style

4c1295e

probicheaux reviewed May 22, 2024

View reviewed changes

ArthurZucker reviewed May 22, 2024

View reviewed changes

probicheaux reviewed May 22, 2024

View reviewed changes

src/transformers/models/paligemma/processing_paligemma.py Show resolved Hide resolved

molbap added 2 commits May 22, 2024 18:23

fix label propagation, revert suffix naming

06353bd

fix style

1d03480

probicheaux reviewed May 22, 2024

View reviewed changes

src/transformers/models/paligemma/modeling_paligemma.py Show resolved Hide resolved

molbap added 2 commits May 22, 2024 18:41

fix labels tokenization

26ac667

[run-slow]paligemma

b08b815

add eos if suffixes are present

e6875a6

[run-slow]paligemma

9ab47e0

molbap added the run-slow label May 22, 2024

molbap added 2 commits May 22, 2024 19:04

[run-slow]paligemma

676b2df

add misssing tokens to fast version

97794a6

ArthurZucker approved these changes May 22, 2024

View reviewed changes

molbap and others added 3 commits May 22, 2024 19:29

Apply suggestions from code review

d041264

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fix style

e6c724d

[run-slow]paligemma

9b00e25

molbap merged commit a25f7d3 into main May 22, 2024
23 checks passed

molbap deleted the paligemma-causal-attention-mask branch May 22, 2024 17:37

probicheaux mentioned this pull request May 22, 2024

PaliGemma fix attention mask for finetunes #30918

Closed

5 tasks

MrToy mentioned this pull request May 23, 2024

fix: paligemma not return loss when not use suffix #30992

Closed

edmondja mentioned this pull request May 23, 2024

Paligemma causal attention still not causal ? #30993

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paligemma causal attention mask #30967

Paligemma causal attention mask #30967

molbap commented May 22, 2024

HuggingFaceDocBuilderDev commented May 22, 2024

probicheaux left a comment •

edited

Loading

ArthurZucker left a comment

probicheaux commented May 22, 2024 •

edited

Loading

molbap commented May 22, 2024

probicheaux commented May 22, 2024

molbap commented May 22, 2024

ArthurZucker May 22, 2024

ArthurZucker May 22, 2024

ArthurZucker commented May 22, 2024

molbap commented May 22, 2024

probicheaux commented May 22, 2024

edmondja commented May 23, 2024 •

edited

Loading

ArthurZucker commented May 23, 2024

edmondja commented May 23, 2024 •

edited by ArthurZucker

Loading

molbap commented May 23, 2024

edmondja commented May 23, 2024

		tokenizer.add_special_tokens(tokens_to_add)
		tokenizer.add_tokens(EXTRA_TOKENS)

	tokenizer.add_special_tokens(tokens_to_add)
	tokenizer.add_tokens(EXTRA_TOKENS)
	tokenizer.add_tokens([image_token]+EXTRA_TOKENS)

Paligemma causal attention mask #30967

Paligemma causal attention mask #30967

Conversation

molbap commented May 22, 2024

HuggingFaceDocBuilderDev commented May 22, 2024

probicheaux left a comment • edited Loading

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

probicheaux commented May 22, 2024 • edited Loading

molbap commented May 22, 2024

probicheaux commented May 22, 2024

molbap commented May 22, 2024

ArthurZucker May 22, 2024

Choose a reason for hiding this comment

ArthurZucker May 22, 2024

Choose a reason for hiding this comment

ArthurZucker commented May 22, 2024

molbap commented May 22, 2024

probicheaux commented May 22, 2024

edmondja commented May 23, 2024 • edited Loading

ArthurZucker commented May 23, 2024

edmondja commented May 23, 2024 • edited by ArthurZucker Loading

molbap commented May 23, 2024

edmondja commented May 23, 2024

probicheaux left a comment •

edited

Loading

probicheaux commented May 22, 2024 •

edited

Loading

edmondja commented May 23, 2024 •

edited

Loading

edmondja commented May 23, 2024 •

edited by ArthurZucker

Loading