-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal Support (Llava 1.5) #821
Conversation
llava-demo-sml.mov |
@damian0815 this is insane, thank you! I'll keep this open as it currently points to your fork but if there's a way we can get llama.cpp to expose the example api and bind to it I'll happily merge it in. |
@abetlen 🙇 i'm waiting on a promised review on my pull request against llama.cpp, likely the API will need to change upstream, so yeah no point merging this until then. |
This is great! However, it seems to have a slightly slower inference time compared to pure C++ code. Does it involve offloading layers to the GPU?
|
if you're referring to the speed in the video - the demo is running off a laptop, it's certainly being battery life and/or thermally throttled. it is running on GPU but it's not intended to illustrate performance :) |
When can this problem be solved now? I need to use this feature recently. Many models are now multi-model |
Also curious what needs done to this for it to be able to be merged - anything I can do to help? |
@Josh-XT @zpzheng one thing you could do is leave a comment on my |
Why not take |
ok so it seems llama.cpp are just ignoring my work. yay. open source communication FTW. @abetlen do you already have other channels of communication open with the llama.cpp repo? i don't want to re-implement/refactor C++ code for it to be ignored/rejected any more times than is strictly necessary |
@damian0815 I'll try to open an issue there as well to get things moving, there's a few projects in the examples folder that I'd love to include in the API here (finetuning, etc) but I understand that it also makes the API surface larger. I'll see what can be done. |
In this issue 3798, Llava1.5 can run on the server and interact with the browser. I also run Llava1.5 on my computer by following the issue's command. |
…cpp-python into damian0815-feat_llava_integration
yahoo |
🔥 |
Is it possible to use multimodality without server / OpenAI wrapper? |
@remixer-dec yes, I should add this to the docs but if you check out llama_cpp/server/app.py you'll see that it's just done by passing a llava specific chat_handler to the Llama class |
Do you think fuyu 8b works with this? |
Works with ggerganov/llama.cpp#3613 , i.e. branch
llava_servable
on my fork of llama.cpp https://github.com/damian0815/llama.cpp/tree/llava_servable .llava C method bindings have just been added to the bottom of
llama_cpp.py
- lmk if there's somewhere better they should go.To run the example: