Chat template: return vectorized output in processors #34275

zucchini-nlp · 2024-10-21T08:28:06Z

What does this PR do?

Part of #33948. This PR adds support for return_tensors="pt" when calling chat templates for processors. That way users can obtain inputs in tensor format and pass it directly to the model, instead of having to call processor with a formatted prompt + visuals.

For images we use the existing functionality load_images and for videos I added a few functions. We usually use av in all video related model docs since decord had problems with CUDA in the past. Apart from that we can use opencv or torchvision for video loading. I did a small benchmark run to load and sample uniformly 32 frames from around ~100 videos and av was the slowest of all them, while decord was the fastest. Therefore I decided to add helper with all possible backends and let users switch whenever they want to. By default we use opencv as it is a more common CV framework than any others provided here.

In the future we might start using torchvision when we add VideoProcessor class and support VideoProcessorFast (see #33504).

These are the results of small benchmarking with ~100 videos:

# Time taken for decord: 475.2979 sec
# Time taken for opencv: 614.6062 sec
# Time taken for av: 1067.0860 sec
# Time taken for torchvision: 1924.0433 sec

Review from @Rocketknight1 for templates and @qubvel for general CV related modifications.

Rocketknight1

This looks good to me! Just to clarify, the idea is that if you pass a chat to apply_chat_template and some of the content fields contain images or videos, and tokenize=True, then images and videos are loaded and processed, so that the output is ready to pass to the model?

src/transformers/processing_utils.py

zucchini-nlp · 2024-10-21T14:45:21Z

the idea is that if you pass a chat to apply_chat_template and some of the content fields contain images or videos, and tokenize=True, then images and videos are loaded and processed, so that the output is ready to pass to the model?

Yep, exactly!

HuggingFaceDocBuilderDev · 2024-10-25T09:34:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel

Thanks for working on this!

The main question that is not clear to me is: what is the backend selection strategy? Should we pass it explicitly? I see load_video is used without passing backend and num_frames arguments

src/transformers/image_utils.py

zucchini-nlp · 2024-10-29T11:43:44Z

welcome back @qubvel ! Okey, I'll add more typehints and better docs. The backend should be selectable by the user, but we default to the one that works in all cases and has no weird cuda related failures. Prob we should document this somewhere, but I didn't yet find a good place for it

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

src/transformers/processing_utils.py

qubvel

Thanks, some nits!

docs/source/en/chat_templating.md

src/transformers/image_utils.py

qubvel · 2024-10-30T08:52:30Z

src/transformers/image_utils.py

+    if video.startswith("https://www.youtube.com") or video.startswith("http://www.youtube.com"):
+        if not is_yt_dlp_available():
+            raise ImportError("To load a video from YouTube url you have  to install `yt_dlp` first.")
+        buffer = BytesIO()
+        with redirect_stdout(buffer), YoutubeDL() as f:
+            f.download([video])
+        bytes_obj = buffer.getvalue()
+        file_obj = BytesIO(bytes_obj)
+    elif video.startswith("http://") or video.startswith("https://"):
+        file_obj = BytesIO(requests.get(video).content)


Some additional kwargs might be required here, e.g. timeout, but probably fine for now

src/transformers/processing_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

zucchini-nlp · 2025-01-08T16:43:58Z

Huh, I don't know why it requested review from some many people, feel free to unsubscribe, sorry

ArthurZucker

For me super good in terms of API!
I am mostly wondering if this does not pose security threats as we are opening links vs before the user had to open the link explicitly in his code.

src/transformers/processing_utils.py

zucchini-nlp · 2025-01-09T09:38:53Z

Hmm, good point about the security. We actually already have a few processors that open links for you, e.g. Idefics and Pixtral. Haven't seen anyone flag it as a security breach so maybe it's not a big deal?

stevhliu

Thanks!

docs/source/en/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

ArthurZucker · 2025-01-10T10:08:06Z

benchmark.py

Let's remove it or put it in the benchmark file, but probably an overkill!

qubvel · 2025-01-10T14:13:05Z

read_video.py

Also this file seems unrelated

qubvel · 2025-01-10T14:13:24Z

run.py

And this one as well

hmellor · 2025-02-26T16:50:04Z

src/transformers/image_utils.py

+if is_decord_available():
+    from decord import VideoReader, cpu
+
+if is_av_available():
+    import av
+
+if is_cv2_available():
+    import cv2
+
+if is_yt_dlp_available():
+    from yt_dlp import YoutubeDL


This block breaks lazy importing of cv2 which vllm strictly enforces. It happens when vLLM imports from transformers.image_utils import ImageInput. vLLM cannot upgrade to v4.49.0 because of it vllm-project/vllm#13905.

Would it be possible to delay this import? This would be preferable to lazily importing ImageInput everywhere it's used in vLLM.

cc @ArthurZucker

update chat template

d66a928

zucchini-nlp requested review from Rocketknight1 and qubvel October 21, 2024 08:28

Rocketknight1 approved these changes Oct 21, 2024

View reviewed changes

src/transformers/processing_utils.py Outdated Show resolved Hide resolved

zucchini-nlp added 4 commits October 25, 2024 10:31

Merge branch 'main' into chat-template-vlms

2bff795

style

3c24aff

fix tests

710edd1

Merge branch 'main' into chat-template-vlms

1bf58f3

zucchini-nlp mentioned this pull request Oct 28, 2024

Add image text to text pipeline #34170

Merged

2 tasks

Merge branch 'main' into chat-template-vlms

76d24ae

qubvel reviewed Oct 29, 2024

View reviewed changes

zucchini-nlp and others added 6 commits October 29, 2024 14:16

Update src/transformers/image_utils.py

eb588d1

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

typehints + docs

3de67e0

fix tests

bcf3dac

Merge branch 'main' into chat-template-vlms

87205d7

remove unnecessary warnings

6282694

forgot code style :(

690c314

qubvel reviewed Oct 29, 2024

View reviewed changes

src/transformers/processing_utils.py Outdated Show resolved Hide resolved

allow users to pass backend and num frames

9049d64

zucchini-nlp requested a review from qubvel October 29, 2024 18:05

qubvel reviewed Oct 30, 2024

View reviewed changes

zucchini-nlp and others added 6 commits October 30, 2024 10:01

Update docs/source/en/chat_templating.md

243b4c3

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Update src/transformers/image_utils.py

899d20d

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Update src/transformers/image_utils.py

47272f8

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Update src/transformers/image_utils.py

fc8ba58

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Update src/transformers/image_utils.py

8b0ddd7

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

Update src/transformers/image_utils.py

d2d27fb

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

zucchini-nlp added 2 commits January 8, 2025 16:46

merge main

376e808

unpack for all kwargs?

de58cb0

zucchini-nlp requested review from molbap, yonigozlan and stevhliu as code owners January 8, 2025 16:34

zucchini-nlp removed request for molbap, stevhliu and yonigozlan January 8, 2025 16:44

zucchini-nlp added 2 commits January 8, 2025 17:44

wrong conflict resolution while rebasing

71a82b5

tmp

4e62720

ArthurZucker approved these changes Jan 9, 2025

View reviewed changes

src/transformers/processing_utils.py Outdated Show resolved Hide resolved

zucchini-nlp added 2 commits January 9, 2025 13:32

update docs

45289f3

Merge branch 'main' into chat-template-vlms

503b153

stevhliu approved these changes Jan 9, 2025

View reviewed changes

docs/source/en/chat_templating.md Outdated Show resolved Hide resolved

docs/source/en/chat_templating.md Outdated Show resolved Hide resolved

docs/source/en/chat_templating.md Outdated Show resolved Hide resolved

docs/source/en/chat_templating.md Outdated Show resolved Hide resolved

zucchini-nlp and others added 5 commits January 10, 2025 10:31

Update docs/source/en/chat_templating.md

2b54a52

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/chat_templating.md

3c3441e

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/chat_templating.md

4600728

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/chat_templating.md

39875be

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Merge branch 'main' into chat-template-vlms

db2ec0c

zucchini-nlp merged commit e0646f3 into huggingface:main Jan 10, 2025
25 checks passed

ArthurZucker reviewed Jan 10, 2025

View reviewed changes

benchmark.py

Copy link

Collaborator

ArthurZucker Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it or put it in the benchmark file, but probably an overkill!

ArthurZucker added a commit that referenced this pull request Jan 10, 2025

Remove benchmark.py after #34275

a9bd1e6

qubvel reviewed Jan 10, 2025

View reviewed changes

yonigozlan mentioned this pull request Jan 10, 2025

Process inputs directly in apply_chat_template in image-text-to-text pipeline #35616

Open

ArthurZucker mentioned this pull request Jan 20, 2025

add qwen2.5vl #35569

Merged

5 tasks

hmellor reviewed Feb 26, 2025

View reviewed changes

hmellor mentioned this pull request Feb 26, 2025

Lazy import libraries in src/transformers/image_utils.py #36435

Merged

albertvillanova mentioned this pull request Mar 6, 2025

TypeError: LlavaProcessor: got multiple values for keyword argument 'images' #36578

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat template: return vectorized output in processors #34275

Chat template: return vectorized output in processors #34275

zucchini-nlp commented Oct 21, 2024

Rocketknight1 left a comment

zucchini-nlp commented Oct 21, 2024

HuggingFaceDocBuilderDev commented Oct 25, 2024

qubvel left a comment

zucchini-nlp commented Oct 29, 2024

qubvel left a comment

qubvel Oct 30, 2024

zucchini-nlp commented Jan 8, 2025

ArthurZucker left a comment

zucchini-nlp commented Jan 9, 2025

stevhliu left a comment

ArthurZucker Jan 10, 2025

qubvel Jan 10, 2025 •

edited

Loading

qubvel Jan 10, 2025

hmellor Feb 26, 2025 •

edited

Loading

Chat template: return vectorized output in processors #34275

Chat template: return vectorized output in processors #34275

Conversation

zucchini-nlp commented Oct 21, 2024

What does this PR do?

Rocketknight1 left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Oct 21, 2024

HuggingFaceDocBuilderDev commented Oct 25, 2024

qubvel left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Oct 29, 2024

qubvel left a comment

Choose a reason for hiding this comment

qubvel Oct 30, 2024

Choose a reason for hiding this comment

zucchini-nlp commented Jan 8, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Jan 9, 2025

stevhliu left a comment

Choose a reason for hiding this comment

ArthurZucker Jan 10, 2025

Choose a reason for hiding this comment

qubvel Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

qubvel Jan 10, 2025

Choose a reason for hiding this comment

hmellor Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

qubvel Jan 10, 2025 •

edited

Loading

hmellor Feb 26, 2025 •

edited

Loading