LiteLLMModel - detect message flatenning based on model information #553
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It seems that it is not a bad idea to decide on message flattening based on info from LiteLLM.
However, I was unable to find anything in its internals or Ollama API to make a decision about whether the model is a VLM or not. So, for now, I had to hardcode that for llava which for now is the only VLM :/
Tested CodeAgent with llava:
It does take a while to produce something, but it seems to work still:

Not ideal by any means, but ...
LiteLLM itself has utility functions to check if a mode supports vision, but they don't really work well with ollama:
Most likely because it iterates over this that provides nothing interesting:
However we can technically get this info ourselves, by making a
/api/show
call to Ollama and fetching this fromprojector_info
. For example in case of llava;And in case of llama3.2-vision:
Here we can look for the
vision
substring.