Add image-text-to-text task guide #31777

merveenoyan · 2024-07-03T14:55:53Z

Added shortly image-text-to-text task guide that includes streaming and more

HuggingFaceDocBuilderDev · 2024-07-03T15:19:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevhliu

Very nice job! 🙂

docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

stevhliu

Awesome work, thanks again!

amyeroberts

Thanks for adding!

Some comments, mostly nits

docs/source/en/tasks/image_text_to_text.md

amyeroberts · 2024-07-09T21:57:55Z

docs/source/en/tasks/image_text_to_text.md

+processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
+```
+
+This model has a [chat template](./chat_templating) format that's required for the input. Moreover, the model can also accept multiple images as input in a single conversation or message. We will now prepare the inputs. 


This isn't quite right, we don't need to use a chat template for the model inputs. It's just useful to correctly format the prompt in the case of message-style inputs

You likely know better, I thought when fine-tuning these chat templates are included in fine-tuning data, thus it is required to use chat templates no? e.g. Mistral one has <INST> </INST>

It really depends - technically the data could be already formatted as the chat string. It just happens that the message format is commonly used. There's no reason I can't pass a string directly to the tokenizer and model directly.

I think I confused this with prompt templates

docs/source/en/tasks/image_text_to_text.md

amyeroberts · 2024-07-09T22:04:25Z

docs/source/en/tasks/image_text_to_text.md

+
+    acc_text = ""
+    for text_token in streamer:
+        time.sleep(0.04)


Why do we need to add this?

otherwise the text flows super fast which is essentially against streaming (and also from my experience it was crashing too)

otherwise the text flows super fast which is essentially against streaming

I'm a bit confused - don't we want our models to generate text as fast as possible? My understanding of streaming is just that we don't wait for completion before returning the result

@amyeroberts the streaming feature enables one to see the tokens flow and stop them from flowing if the generation is going to a bad place, as in https://huggingface.co/docs/text-generation-inference/en/conceptual/streaming so we'd like it to wait a bit in between tokens

docs/source/en/tasks/image_text_to_text.md

amyeroberts · 2024-07-09T22:08:26Z

docs/source/en/tasks/image_text_to_text.md

+quantized_model = Idefics2ForConditionalGeneration.from_pretrained(model_id, device_map="cuda", quantization_config=quantization_config)
+```
+
+And that's it, we can use the model the same way with no changes.


It would be good here to note what kind of change this makes e.g. x% reduction in memory footprint

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

merveenoyan · 2024-07-17T09:01:27Z

@amyeroberts I have addressed all your comments, can you merge if you approve?

amyeroberts

Looks great - thanks for adding and iterating on this!

amyeroberts · 2024-07-17T19:47:42Z

docs/source/en/tasks/image_text_to_text.md

+processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b")
+```
+
+This model has a [chat template](./chat_templating) format that's required for the input. Moreover, the model can also accept multiple images as input in a single conversation or message. We will now prepare the inputs. 


It really depends - technically the data could be already formatted as the chat string. It just happens that the message format is commonly used. There's no reason I can't pass a string directly to the tokenizer and model directly.

amyeroberts · 2024-07-17T19:49:28Z

docs/source/en/tasks/image_text_to_text.md

+
+    acc_text = ""
+    for text_token in streamer:
+        time.sleep(0.04)


otherwise the text flows super fast which is essentially against streaming

I'm a bit confused - don't we want our models to generate text as fast as possible? My understanding of streaming is just that we don't wait for completion before returning the result

amyeroberts · 2024-07-19T12:40:34Z

@merveenoyan Not really sure why the CI runs are consistently failing here, but since this PR is just a doc page so shouldn't affect hub etc. I'm going to merge

Add image-text-to-text task page

61fc59b

merveenoyan requested a review from stevhliu July 3, 2024 14:56

stevhliu reviewed Jul 3, 2024

View reviewed changes

merveenoyan and others added 13 commits July 4, 2024 02:38

Update docs/source/en/tasks/image_text_to_text.md

c282928

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

9c28150

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

ed0ce47

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

f0227c0

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

ba872f8

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

78a4ee6

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

0755adc

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

91a6ab3

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

43ab484

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

5f8b08b

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

6946805

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Address comments

b4d1028

Fix heading

755374e

merveenoyan requested a review from stevhliu July 4, 2024 08:44

stevhliu approved these changes Jul 8, 2024

View reviewed changes

stevhliu requested a review from amyeroberts July 8, 2024 17:10

amyeroberts reviewed Jul 9, 2024

View reviewed changes

merveenoyan and others added 7 commits July 10, 2024 11:54

Update docs/source/en/tasks/image_text_to_text.md

d2e4dd6

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

5be2b7f

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

f862630

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

0db46a9

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

3a9d5f6

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update docs/source/en/tasks/image_text_to_text.md

5652ffc

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Address comments

e834f9f

merveenoyan requested a review from amyeroberts July 10, 2024 09:30

Update image_text_to_text.md

e81a549

amyeroberts approved these changes Jul 19, 2024

View reviewed changes

amyeroberts merged commit 46835ec into huggingface:main Jul 19, 2024
18 of 21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image-text-to-text task guide #31777

Add image-text-to-text task guide #31777

merveenoyan commented Jul 3, 2024

HuggingFaceDocBuilderDev commented Jul 3, 2024

stevhliu left a comment

stevhliu left a comment

amyeroberts left a comment

amyeroberts Jul 9, 2024

merveenoyan Jul 10, 2024 •

edited

Loading

amyeroberts Jul 17, 2024

merveenoyan Jul 19, 2024

amyeroberts Jul 9, 2024

merveenoyan Jul 10, 2024

amyeroberts Jul 17, 2024

merveenoyan Jul 20, 2024

amyeroberts Jul 9, 2024

merveenoyan commented Jul 17, 2024

amyeroberts left a comment

amyeroberts Jul 17, 2024

amyeroberts Jul 17, 2024

amyeroberts commented Jul 19, 2024

Add image-text-to-text task guide #31777

Add image-text-to-text task guide #31777

Conversation

merveenoyan commented Jul 3, 2024

HuggingFaceDocBuilderDev commented Jul 3, 2024

stevhliu left a comment

Choose a reason for hiding this comment

stevhliu left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merveenoyan Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merveenoyan commented Jul 17, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts commented Jul 19, 2024

merveenoyan Jul 10, 2024 •

edited

Loading