-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weights of BlipModel are not initialized from the model checkpoint #25024
Comments
Also cc @ydshieh who was just discussing this internally :-) |
Hi @Vibhu04 |
Hi @younesbelkada, thanks a lot for your prompt reply. I actually want to compute the image-text similarity score given an input image and a text, and I was hoping I could use |
Thanks for your reply @Vibhu04 import requests
from PIL import Image
from transformers import BlipProcessor, BlipForImageTextRetrieval
processor = BlipProcessor.from_pretrained("Salesforce/blip-itm-base-coco")
model = BlipForImageTextRetrieval.from_pretrained("Salesforce/blip-itm-base-coco")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
question = "A woman and a dog sitting together in a beach."
inputs = processor(raw_image, question, return_tensors="pt")
itm_scores = model(**inputs)[0]
cosine_score = model(**inputs, use_itm_head=False)[0] |
Hi @younesbelkada, thank you so much. If I may, I just have one last question: is there a lighter variant (i.e. fewer parameters) of the model that you mentioned? Thanks a lot. |
Hi @Vibhu04 import requests
from PIL import Image
import torch
from transformers import BlipProcessor, BlipForImageTextRetrieval
processor = BlipProcessor.from_pretrained("Salesforce/blip-itm-base-coco")
model = BlipForImageTextRetrieval.from_pretrained("Salesforce/blip-itm-base-coco", torch_dtype=torch.bfloat16)
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
question = "A woman and a dog sitting together in a beach."
inputs = processor(raw_image, question, return_tensors="pt").to(torch.bfloat16)
itm_scores = model(**inputs)[0]
cosine_score = model(**inputs, use_itm_head=False)[0] |
Thank you so much for your help @younesbelkada! |
System Info
transformers
version: 4.31.0.dev0Who can help?
@younesbelkada @ArthurZucker @amyeroberts @ydshieh
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
The code snippet is an example from https://huggingface.co/docs/transformers/model_doc/blip#transformers.BlipProcessor.
The warning that I get is:
Some weights of BlipModel were not initialized from the model checkpoint at Salesforce/blip-image-captioning-base and are newly initialized: ['text_model.encoder.layer.10.crossattention.output.dense.weight', 'text_model.encoder.layer.4.attention.output.LayerNorm.bias', 'text_model.encoder.layer.2.intermediate.dense.bias', 'text_model.encoder.layer.1.attention.self.value.bias', 'text_model.encoder.layer.5.attention.output.LayerNorm.bias', 'text_model.encoder.layer.2.attention.output.dense.bias', 'text_model.encoder.layer.1.crossattention.self.key.weight', 'text_model.encoder.layer.5.crossattention.self.key.bias', 'text_model.encoder.layer.11.crossattention.output.LayerNorm.bias', 'text_model.encoder.layer.1.attention.self.value.weight', 'text_model.encoder.layer.8.attention.self.key.weight', 'text_model.encoder.layer.9.crossattention.output.dense.bias', 'text_model.encoder.layer.7.crossattention.self.key.bias', 'text_model.encoder.layer.1.attention.output.dense.bias', 'text_model.encoder.layer.8.output.LayerNorm.bias', ...
It seems that the model weights are being initialised anew as there's some error with loading the pre-trained weights. Please guide me in solving this issue.
The text was updated successfully, but these errors were encountered: