Video-LLaVA with transformers library #29640

Kamakshi8104 · 2024-03-13T16:37:19Z

Model description

Video-LLaVA is a multimodal model that is trained on both images and videos simultaneously. I feel that it would be highly beneficial if it is added to the transformers library as it is a pretty good choice when it comes to video-question answering

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

https://huggingface.co/LanguageBind/Video-LLaVA-7B

zucchini-nlp · 2024-03-13T18:14:08Z

If the authors will not be willing to contribute, I want to work on this

cc @gante

amyeroberts · 2024-03-13T18:30:35Z

Great @zucchini-nlp! I'd reach out on the model page on the hub, asking the authors if they'd like to add it to the library. If they're happy to just leave as-is it's all yours!

Kamakshi8104 added the New model label Mar 13, 2024

Kamakshi8104 mentioned this issue Mar 13, 2024

Video-LLaVA with Transformers #29630

Closed

zucchini-nlp mentioned this issue Mar 19, 2024

Add Video Llava #29733

Merged

zucchini-nlp closed this as completed in #29733 May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video-LLaVA with transformers library #29640

Video-LLaVA with transformers library #29640

Kamakshi8104 commented Mar 13, 2024 •

edited

Loading

zucchini-nlp commented Mar 13, 2024

amyeroberts commented Mar 13, 2024

Video-LLaVA with transformers library #29640

Video-LLaVA with transformers library #29640

Comments

Kamakshi8104 commented Mar 13, 2024 • edited Loading

Model description

Open source status

Provide useful links for the implementation

zucchini-nlp commented Mar 13, 2024

amyeroberts commented Mar 13, 2024

Kamakshi8104 commented Mar 13, 2024 •

edited

Loading