You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
fromtransformersimportpipelineurls= ["https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/tree.png"]
oracle=pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa")
oracle(question="What's in the image?", image=urls, top_k=1)
(Truncated) error:
TypeError Traceback (most recent call last)
Cell In[1], [line 11](vscode-notebook-cell:?execution_count=1&line=11)
[8](vscode-notebook-cell:?execution_count=1&line=8) oracle = pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa", image_processor=image_processor)
[9](vscode-notebook-cell:?execution_count=1&line=9) # for out in tqdm(oracle(question="What's in this image", image=dataset, top_k=1)):
[10](vscode-notebook-cell:?execution_count=1&line=10) # print(out)
---> [11](vscode-notebook-cell:?execution_count=1&line=11) oracle(question="What's in this image", image=urls, top_k=1)
File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114, in VisualQuestionAnsweringPipeline.__call__(self, image, question, **kwargs)
[107](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:107) """
[108](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:108) Supports the following format
[109](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:109) - {"image": image, "question": question}
[110](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:110) - [{"image": image, "question": question}]
[111](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:111) - Generator and datasets
[112](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:112) """
[113](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:113) inputs = image
--> [114](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114) results = super().__call__(inputs, **kwargs)
[115](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:115) return results
File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
[1220](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1220) if can_use_iterator:
[1221](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1221) final_iterator = self.get_iterator(
[1222](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1222) inputs, num_workers, batch_size, preprocess_params, forward_params, postprocess_params
[1223](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1223) )
-> [1224](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224) outputs = list(final_iterator)
...
[120](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:120) inputs["question"], return_tensors=self.framework, padding=padding, truncation=truncation
[121](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:121) )
[122](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:122) image_features = self.image_processor(images=image, return_tensors=self.framework)
TypeError: string indices must be integers
This error is reproducible on the latest version (v4.41.2)
Expected behavior
The pipeline should broadcast the same question on all images and execute the model on those image-question pair, as per the documentation
Note: This currently works, but it is not as easy to use as passing the lists directly (and this doesn't allow passing the dataset directly like this):
oracle([{"question": "What's in the image?", "image": url} forurlinurls])
The text was updated successfully, but these errors were encountered:
Currently, the call function only handles one image-question pair as input. I can make a quick PR to make it also handle list of images and questions. I have no idea about the dataset part, though
System Info
transformers
version: 4.42.0.dev0Who can help?
@Narsil @sijunhe
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
(Truncated) error:
This error is reproducible on the latest version (v4.41.2)
Expected behavior
The pipeline should broadcast the same question on all images and execute the model on those image-question pair, as per the documentation
Note: This currently works, but it is not as easy to use as passing the lists directly (and this doesn't allow passing the
dataset
directly like this):The text was updated successfully, but these errors were encountered: