You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Video-LLaVA is a multimodal model that is trained on both images and videos simultaneously. I feel that it would be highly beneficial if it is added to the transformers library as it is a pretty good choice when it comes to video-question answering
Great @zucchini-nlp! I'd reach out on the model page on the hub, asking the authors if they'd like to add it to the library. If they're happy to just leave as-is it's all yours!
Model description
Video-LLaVA is a multimodal model that is trained on both images and videos simultaneously. I feel that it would be highly beneficial if it is added to the transformers library as it is a pretty good choice when it comes to video-question answering
Open source status
Provide useful links for the implementation
https://huggingface.co/LanguageBind/Video-LLaVA-7B
The text was updated successfully, but these errors were encountered: