LiteLLMModel - detect message flatenning based on model information #553

sysradium · 2025-02-07T23:04:49Z

It seems that it is not a bad idea to decide on message flattening based on info from LiteLLM.
However, I was unable to find anything in its internals or Ollama API to make a decision about whether the model is a VLM or not. So, for now, I had to hardcode that for llava which for now is the only VLM :/

Tested CodeAgent with llava:

from smolagents import LiteLLMModel, CodeAgent
from PIL import Image

model = LiteLLMModel(api_base="http://localhost:11434", model_id="ollama/llava", num_ctx=16384)

agent = CodeAgent(model=model, tools=[])
agent.run("What is in the provided image?", images=[Image.open("./tests/fixtures/000000039769.png")])

It does take a while to produce something, but it seems to work still:

Not ideal by any means, but ...

LiteLLM itself has utility functions to check if a mode supports vision, but they don't really work well with ollama:

supports_vision('llava', 'ollama')
False

Most likely because it iterates over this that provides nothing interesting:

litellm.OllamaConfig().get_model_info('llava')
{'key': 'llava', 'litellm_provider': 'ollama', 'mode': 'chat', 'supports_function_calling': False, 'input_cost_per_token': 0.0, 'output_cost_per_token': 0.0, 'max_tokens': 32768, 'max_input_tokens': 32768, 'max_output_tokens': 32768}

However we can technically get this info ourselves, by making a /api/show call to Ollama and fetching this from projector_info. For example in case of llava;

        "clip.has_llava_projector": true,
        "clip.has_text_encoder": false,
        "clip.has_vision_encoder": true,
        "clip.projector_type": "mlp",
        "clip.use_gelu": false,
        "clip.vision.attention.head_count": 16,
        "clip.vision.attention.layer_norm_epsilon": 1e-05,
        "clip.vision.block_count": 23,
        "clip.vision.embedding_length": 1024,
        "clip.vision.feed_forward_length": 4096,

And in case of llama3.2-vision:

    "projector_info": {
        "general.architecture": "mllama",
        "general.description": "vision encoder for Mllama",
        "general.file_type": 1,
        "general.name": "Llama-3.2-11B-Vision-Instruct",
        "general.parameter_count": 895028756,
        "general.type": "projector",
        "mllama.vision.attention.head_count": 16,
        "mllama.vision.attention.layer_norm_epsilon": 1e-05,

Here we can look for the vision substring.

sysradium · 2025-02-12T17:36:04Z

@aymeric-roucher if you have a sec. Sorry, can't assign you as a reviewer

aymeric-roucher

Thank you @sysradium !

(LiteLLModel) detect message flatenning based on model information

0d9bc47

sysradium marked this pull request as ready for review February 7, 2025 23:08

sysradium mentioned this pull request Feb 7, 2025

LiteLLM ollama bugs Update #551

Open

sysradium changed the title ~~(LiteLLModel) detect message flatenning based on model information~~ LiteLLMModel - detect message flatenning based on model information Feb 7, 2025

sysradium mentioned this pull request Feb 9, 2025

[BUG] helium vision_web_browser.py NoneType error after saving image #570

Closed

aymeric-roucher approved these changes Feb 12, 2025

View reviewed changes

aymeric-roucher merged commit 392fc5a into huggingface:main Feb 13, 2025
3 checks passed

sysradium deleted the better-flattening-control branch February 13, 2025 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiteLLMModel - detect message flatenning based on model information #553

LiteLLMModel - detect message flatenning based on model information #553

sysradium commented Feb 7, 2025 •

edited

Loading

sysradium commented Feb 12, 2025

aymeric-roucher left a comment

LiteLLMModel - detect message flatenning based on model information #553

LiteLLMModel - detect message flatenning based on model information #553

Conversation

sysradium commented Feb 7, 2025 • edited Loading

sysradium commented Feb 12, 2025

aymeric-roucher left a comment

Choose a reason for hiding this comment

sysradium commented Feb 7, 2025 •

edited

Loading