Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MM-ReAct (Multimodal Reasoning and Action) #2262

Closed
slavakurilyak opened this issue Apr 1, 2023 · 7 comments
Closed

MM-ReAct (Multimodal Reasoning and Action) #2262

slavakurilyak opened this issue Apr 1, 2023 · 7 comments

Comments

@slavakurilyak
Copy link

slavakurilyak commented Apr 1, 2023

For GPT-4, image inputs are still in limited alpha.

For GPT-3.5, it would be great to see LangChain use the MM-ReAct agent.

@aryansid
Copy link

Is it available on langchain now?

@slavakurilyak
Copy link
Author

GPT-4(Vision) is now available which unlocks multi-modal reasoning and action.

According to OpenAI:

We’re excited to roll out these capabilities to other groups of users, including developers, soon after.

This model is currently available using the web UI but API access is inevitable.

@alifanov
Copy link

llava (multimodal llama) is open sourced now

@slavakurilyak
Copy link
Author

slavakurilyak commented Oct 13, 2023

Llava is now merged into Llama.cpp here

LangChain provides bindings to Llama.cpp here and here

I'm closing this issue since Llava is a great alternative to MM-ReAct

@rlancemartin
Copy link
Collaborator

rlancemartin commented Oct 14, 2023

It's true that LLaVA merged w/ llama.cpp.

But, we also need multi-modal integration w/ lama-cpp-python to run LLaVA directly in LangChain.

I put a ticket up for it:
abetlen/llama-cpp-python#813

Worth keeping this open until that is in place.

I merged a notebook that runs LLaVA (bash script) w/o the py integration:
#11582

@rlancemartin rlancemartin reopened this Oct 14, 2023
@Se-Hun
Copy link

Se-Hun commented Nov 14, 2023

Now, we can use gpt-4-vision-preview model.

https://platform.openai.com/docs/guides/vision

@slavakurilyak
Copy link
Author

According to OpenAI:

GPT-4 with vision is currently available to all developers who have access to GPT-4 via the gpt-4-vision-preview model and the Chat Completions API which has been updated to support image inputs

Since multimodal LLMs like GPT-4V are now generally available, I'm closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants