[pipeline] VQA pipeline does not accept list as input #31216

BlacCod · 2024-06-03T23:30:29Z

System Info

transformers version: 4.42.0.dev0
Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.0
Safetensors version: 0.4.3
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.3.0+cpu (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: no

Who can help?

@Narsil @sijunhe

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import pipeline
urls = ["https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/tree.png"]

oracle = pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa")
oracle(question="What's in the image?", image=urls, top_k=1)

(Truncated) error:

TypeError                                 Traceback (most recent call last)
Cell In[1], [line 11](vscode-notebook-cell:?execution_count=1&line=11)
      [8](vscode-notebook-cell:?execution_count=1&line=8) oracle = pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa", image_processor=image_processor)
      [9](vscode-notebook-cell:?execution_count=1&line=9) # for out in tqdm(oracle(question="What's in this image", image=dataset, top_k=1)):
     [10](vscode-notebook-cell:?execution_count=1&line=10) #     print(out)
---> [11](vscode-notebook-cell:?execution_count=1&line=11) oracle(question="What's in this image", image=urls, top_k=1)

File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114, in VisualQuestionAnsweringPipeline.__call__(self, image, question, **kwargs)
    [107](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:107)     """
    [108](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:108)     Supports the following format
    [109](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:109)     - {"image": image, "question": question}
    [110](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:110)     - [{"image": image, "question": question}]
    [111](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:111)     - Generator and datasets
    [112](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:112)     """
    [113](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:113)     inputs = image
--> [114](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114) results = super().__call__(inputs, **kwargs)
    [115](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:115) return results

File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   [1220](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1220) if can_use_iterator:
   [1221](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1221)     final_iterator = self.get_iterator(
   [1222](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1222)         inputs, num_workers, batch_size, preprocess_params, forward_params, postprocess_params
   [1223](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1223)     )
-> [1224](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224)     outputs = list(final_iterator)
...
    [120](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:120)         inputs["question"], return_tensors=self.framework, padding=padding, truncation=truncation
    [121](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:121)     )
    [122](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:122)     image_features = self.image_processor(images=image, return_tensors=self.framework)

TypeError: string indices must be integers

This error is reproducible on the latest version (v4.41.2)

Expected behavior

The pipeline should broadcast the same question on all images and execute the model on those image-question pair, as per the documentation

Note: This currently works, but it is not as easy to use as passing the lists directly (and this doesn't allow passing the dataset directly like this):

oracle([{"question": "What's in the image?", "image": url} for url in urls])

The text was updated successfully, but these errors were encountered:

BlacCod · 2024-06-03T23:36:23Z

Currently, the call function only handles one image-question pair as input. I can make a quick PR to make it also handle list of images and questions. I have no idea about the dataset part, though

BlacCod mentioned this issue Jun 3, 2024

Pipeline VQA: Add support for list of images and questions as pipeline input #31217

Merged

5 tasks

amyeroberts added Core: Pipeline Internals of the library; Pipeline. Feature request Request for a new feature labels Jun 4, 2024

amyeroberts closed this as completed in #31217 Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pipeline] VQA pipeline does not accept list as input #31216

[pipeline] VQA pipeline does not accept list as input #31216

BlacCod commented Jun 3, 2024 •

edited

Loading

BlacCod commented Jun 3, 2024

[pipeline] VQA pipeline does not accept list as input #31216

[pipeline] VQA pipeline does not accept list as input #31216

Comments

BlacCod commented Jun 3, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BlacCod commented Jun 3, 2024

BlacCod commented Jun 3, 2024 •

edited

Loading