Video-LLaVa: handle any number of frames #31221

zucchini-nlp · 2024-06-04T08:56:28Z

What does this PR do?

As per title, gives users freedom to sample any number of frames and tune the model with it. The inference will not generate quality text if we sample more than 8 frames, so it's only for those who want to tune with longer videos.

All video-llava tests are passing locally.

amyeroberts

Nice - thank you!

HuggingFaceDocBuilderDev · 2024-06-04T09:17:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

video-llava can handle more frames

6ef732f

zucchini-nlp requested a review from amyeroberts June 4, 2024 08:56

amyeroberts approved these changes Jun 4, 2024

View reviewed changes

zucchini-nlp merged commit d64e4da into huggingface:main Jun 4, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video-LLaVa: handle any number of frames #31221

Video-LLaVa: handle any number of frames #31221

zucchini-nlp commented Jun 4, 2024

amyeroberts left a comment

HuggingFaceDocBuilderDev commented Jun 4, 2024

Video-LLaVa: handle any number of frames #31221

Video-LLaVa: handle any number of frames #31221

Conversation

zucchini-nlp commented Jun 4, 2024

What does this PR do?

amyeroberts left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 4, 2024