MM-ReAct (Multimodal Reasoning and Action) #2262

slavakurilyak · 2023-04-01T05:52:16Z

For GPT-4, image inputs are still in limited alpha.

For GPT-3.5, it would be great to see LangChain use the MM-ReAct agent.

Repo: github.com/microsoft/MM-REACT
Website: multimodal-react.github.io
Demo: huggingface.co/spaces/microsoft-cognitive-service/mm-react
Paper Abstract: arxiv.org/abs/2303.11381
Paper PDF: arxiv.org/pdf/2303.11381.pdf

aryansid · 2023-08-24T15:04:07Z

Is it available on langchain now?

slavakurilyak · 2023-10-06T16:39:33Z

GPT-4(Vision) is now available which unlocks multi-modal reasoning and action.

According to OpenAI:

We’re excited to roll out these capabilities to other groups of users, including developers, soon after.

This model is currently available using the web UI but API access is inevitable.

alifanov · 2023-10-13T09:23:20Z

llava (multimodal llama) is open sourced now

slavakurilyak · 2023-10-13T12:03:09Z

Llava is now merged into Llama.cpp here

LangChain provides bindings to Llama.cpp here and here

I'm closing this issue since Llava is a great alternative to MM-ReAct

rlancemartin · 2023-10-14T04:01:11Z

It's true that LLaVA merged w/ llama.cpp.

But, we also need multi-modal integration w/ lama-cpp-python to run LLaVA directly in LangChain.

I put a ticket up for it:
abetlen/llama-cpp-python#813

Worth keeping this open until that is in place.

I merged a notebook that runs LLaVA (bash script) w/o the py integration:
#11582

Se-Hun · 2023-11-14T04:06:57Z

Now, we can use gpt-4-vision-preview model.

https://platform.openai.com/docs/guides/vision

slavakurilyak · 2024-01-25T14:13:53Z

According to OpenAI:

GPT-4 with vision is currently available to all developers who have access to GPT-4 via the gpt-4-vision-preview model and the Chat Completions API which has been updated to support image inputs

Since multimodal LLMs like GPT-4V are now generally available, I'm closing this issue

slavakurilyak closed this as completed Oct 13, 2023

rlancemartin reopened this Oct 14, 2023

slavakurilyak closed this as completed Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MM-ReAct (Multimodal Reasoning and Action) #2262

MM-ReAct (Multimodal Reasoning and Action) #2262

slavakurilyak commented Apr 1, 2023 •

edited

Loading

aryansid commented Aug 24, 2023

slavakurilyak commented Oct 6, 2023

alifanov commented Oct 13, 2023

slavakurilyak commented Oct 13, 2023 •

edited

Loading

rlancemartin commented Oct 14, 2023 •

edited

Loading

Se-Hun commented Nov 14, 2023

slavakurilyak commented Jan 25, 2024

MM-ReAct (Multimodal Reasoning and Action) #2262

MM-ReAct (Multimodal Reasoning and Action) #2262

Comments

slavakurilyak commented Apr 1, 2023 • edited Loading

aryansid commented Aug 24, 2023

slavakurilyak commented Oct 6, 2023

alifanov commented Oct 13, 2023

slavakurilyak commented Oct 13, 2023 • edited Loading

rlancemartin commented Oct 14, 2023 • edited Loading

Se-Hun commented Nov 14, 2023

slavakurilyak commented Jan 25, 2024

slavakurilyak commented Apr 1, 2023 •

edited

Loading

slavakurilyak commented Oct 13, 2023 •

edited

Loading

rlancemartin commented Oct 14, 2023 •

edited

Loading