Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal Support (Llava 1.5) #821

Merged
merged 29 commits into from
Nov 8, 2023

Conversation

damian0815
Copy link
Contributor

Works with ggerganov/llama.cpp#3613 , i.e. branch llava_servable on my fork of llama.cpp https://github.com/damian0815/llama.cpp/tree/llava_servable .

llava C method bindings have just been added to the bottom of llama_cpp.py - lmk if there's somewhere better they should go.

To run the example:

cp llama-cpp-python/examples/multimodal
python llava.py -m path/to/llava-v1.5/ggml-model-q5_k.gguf --mmproj path/to/mmproj-model-f16.gguf

@damian0815
Copy link
Contributor Author

llava-demo-sml.mov

@abetlen
Copy link
Owner

abetlen commented Oct 18, 2023

@damian0815 this is insane, thank you! I'll keep this open as it currently points to your fork but if there's a way we can get llama.cpp to expose the example api and bind to it I'll happily merge it in.

@damian0815
Copy link
Contributor Author

@abetlen 🙇

i'm waiting on a promised review on my pull request against llama.cpp, likely the API will need to change upstream, so yeah no point merging this until then.

@damian0815 damian0815 marked this pull request as draft October 18, 2023 20:45
@y10ab1
Copy link

y10ab1 commented Oct 23, 2023

This is great! However, it seems to have a slightly slower inference time compared to pure C++ code. Does it involve offloading layers to the GPU?

llava-demo-sml.mov

@damian0815
Copy link
Contributor Author

if you're referring to the speed in the video - the demo is running off a laptop, it's certainly being battery life and/or thermally throttled. it is running on GPU but it's not intended to illustrate performance :)

@zpzheng
Copy link

zpzheng commented Oct 25, 2023

When can this problem be solved now? I need to use this feature recently. Many models are now multi-model

@Josh-XT
Copy link
Contributor

Josh-XT commented Oct 25, 2023

Also curious what needs done to this for it to be able to be merged - anything I can do to help?

@damian0815
Copy link
Contributor Author

damian0815 commented Oct 25, 2023

@Josh-XT @zpzheng one thing you could do is leave a comment on my llava.cpp PR (link: ggerganov/llama.cpp#3613) as there's a code review there i'm waiting on for over a week now

@zhicwu
Copy link

zhicwu commented Oct 26, 2023

Why not take subprocess as a temporary workaround to unblock your work first? It crashes sometimes on my MacBook and PC. You may also try ggerganov/llama.cpp#3682.

@damian0815
Copy link
Contributor Author

ok so it seems llama.cpp are just ignoring my work. yay. open source communication FTW.

@abetlen do you already have other channels of communication open with the llama.cpp repo? i don't want to re-implement/refactor C++ code for it to be ignored/rejected any more times than is strictly necessary

@abetlen
Copy link
Owner

abetlen commented Nov 1, 2023

@damian0815 I'll try to open an issue there as well to get things moving, there's a few projects in the examples folder that I'd love to include in the API here (finetuning, etc) but I understand that it also makes the API surface larger. I'll see what can be done.

@BobCN2017
Copy link

BobCN2017 commented Nov 5, 2023

In this issue 3798, Llava1.5 can run on the server and interact with the browser. I also run Llava1.5 on my computer by following the issue's command.

@abetlen abetlen changed the title llava v1.5 support Multimodal Support (Llava 1.5) Nov 6, 2023
@damian0815
Copy link
Contributor Author

yahoo

@abetlen
Copy link
Owner

abetlen commented Nov 8, 2023

image

Still have to write up some docs for setting up the server but it's working with the new OpenAI API.

@abetlen abetlen marked this pull request as ready for review November 8, 2023 03:43
@abetlen abetlen merged commit aab74f0 into abetlen:main Nov 8, 2023
@teleprint-me
Copy link
Contributor

🔥

@remixer-dec
Copy link

Is it possible to use multimodality without server / OpenAI wrapper?

@abetlen
Copy link
Owner

abetlen commented Nov 10, 2023

@remixer-dec yes, I should add this to the docs but if you check out llama_cpp/server/app.py you'll see that it's just done by passing a llava specific chat_handler to the Llama class

@exploringweirdmachines
Copy link

exploringweirdmachines commented Nov 16, 2023

Do you think fuyu 8b works with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants