Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipeline] VQA pipeline does not accept list as input #31216

Closed
2 of 4 tasks
BlacCod opened this issue Jun 3, 2024 · 1 comment · Fixed by #31217
Closed
2 of 4 tasks

[pipeline] VQA pipeline does not accept list as input #31216

BlacCod opened this issue Jun 3, 2024 · 1 comment · Fixed by #31217
Labels
Core: Pipeline Internals of the library; Pipeline. Feature request Request for a new feature

Comments

@BlacCod
Copy link
Contributor

BlacCod commented Jun 3, 2024

System Info

  • transformers version: 4.42.0.dev0
  • Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.0
  • Safetensors version: 0.4.3
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cpu (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no

Who can help?

@Narsil @sijunhe

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import pipeline
urls = ["https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/tree.png"]

oracle = pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa")
oracle(question="What's in the image?", image=urls, top_k=1)

(Truncated) error:

TypeError                                 Traceback (most recent call last)
Cell In[1], [line 11](vscode-notebook-cell:?execution_count=1&line=11)
      [8](vscode-notebook-cell:?execution_count=1&line=8) oracle = pipeline(task="vqa", model="dandelin/vilt-b32-finetuned-vqa", image_processor=image_processor)
      [9](vscode-notebook-cell:?execution_count=1&line=9) # for out in tqdm(oracle(question="What's in this image", image=dataset, top_k=1)):
     [10](vscode-notebook-cell:?execution_count=1&line=10) #     print(out)
---> [11](vscode-notebook-cell:?execution_count=1&line=11) oracle(question="What's in this image", image=urls, top_k=1)

File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114, in VisualQuestionAnsweringPipeline.__call__(self, image, question, **kwargs)
    [107](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:107)     """
    [108](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:108)     Supports the following format
    [109](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:109)     - {"image": image, "question": question}
    [110](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:110)     - [{"image": image, "question": question}]
    [111](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:111)     - Generator and datasets
    [112](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:112)     """
    [113](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:113)     inputs = image
--> [114](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:114) results = super().__call__(inputs, **kwargs)
    [115](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:115) return results

File ~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   [1220](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1220) if can_use_iterator:
   [1221](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1221)     final_iterator = self.get_iterator(
   [1222](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1222)         inputs, num_workers, batch_size, preprocess_params, forward_params, postprocess_params
   [1223](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1223)     )
-> [1224](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/base.py:1224)     outputs = list(final_iterator)
...
    [120](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:120)         inputs["question"], return_tensors=self.framework, padding=padding, truncation=truncation
    [121](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:121)     )
    [122](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/blaccod/dev/lab24-env/lab24-multimodal-cardinality-estimation/~/dev/lab24-env/lib/python3.10/site-packages/transformers/pipelines/visual_question_answering.py:122)     image_features = self.image_processor(images=image, return_tensors=self.framework)

TypeError: string indices must be integers

This error is reproducible on the latest version (v4.41.2)

Expected behavior

The pipeline should broadcast the same question on all images and execute the model on those image-question pair, as per the documentation

Note: This currently works, but it is not as easy to use as passing the lists directly (and this doesn't allow passing the dataset directly like this):

oracle([{"question": "What's in the image?", "image": url} for url in urls])
@BlacCod
Copy link
Contributor Author

BlacCod commented Jun 3, 2024

Currently, the call function only handles one image-question pair as input. I can make a quick PR to make it also handle list of images and questions. I have no idea about the dataset part, though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: Pipeline Internals of the library; Pipeline. Feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants