Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video-LLaVA with transformers library #29640

Closed
2 tasks
Kamakshi8104 opened this issue Mar 13, 2024 · 2 comments · Fixed by #29733
Closed
2 tasks

Video-LLaVA with transformers library #29640

Kamakshi8104 opened this issue Mar 13, 2024 · 2 comments · Fixed by #29733

Comments

@Kamakshi8104
Copy link

Kamakshi8104 commented Mar 13, 2024

Model description

Video-LLaVA is a multimodal model that is trained on both images and videos simultaneously. I feel that it would be highly beneficial if it is added to the transformers library as it is a pretty good choice when it comes to video-question answering

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

https://huggingface.co/LanguageBind/Video-LLaVA-7B

@zucchini-nlp
Copy link
Member

If the authors will not be willing to contribute, I want to work on this

cc @gante

@amyeroberts
Copy link
Collaborator

Great @zucchini-nlp! I'd reach out on the model page on the hub, asking the authors if they'd like to add it to the library. If they're happy to just leave as-is it's all yours!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants