Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sambanova, Groq and Cerebras models not found after update to 1.9.1 #655

Open
faev999 opened this issue Feb 14, 2025 · 11 comments
Open
Labels
bug Something isn't working

Comments

@faev999
Copy link

faev999 commented Feb 14, 2025

Describe the bug
Sambanova, Groq and Cerebras models not found after update to 1.9.1. Running an agent (whether Code or ToolCalling) using models from said companies ends up in an error.

Code to reproduce the error

from smolagents import CodeAgent, LiteLLMModel, ToolCallingAgent

model_agent_a = LiteLLMModel(
    model_id="cerebras/llama-3.3-70b", # same for groq and sambanova
    api_key=os.getenv("CEREBRAS_API_KEY"),
)

agent_a = ToolCallingAgent(
    tools=[],
    model=model_agent_a,
)

agent_a.run(task="hello")

Error logs (if any)

Error in generating tool call with model:
This model isn't mapped yet. model=cerebras/llama-3.3-70b, custom_llm_provider=cerebras. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.

Expected behavior
In verion 1.8.1 this was working normally with a smol tweak mentioned in here

Packages version:
1.9.1

@faev999 faev999 added the bug Something isn't working label Feb 14, 2025
@faev999 faev999 changed the title [BUG] [BUG] Sambanova, Groq and Cerebras models not found after update to 1.9.1 Feb 14, 2025
@aymeric-roucher
Copy link
Collaborator

This issues seems to be on litellm side. Could you share your litellm version?

@faev999
Copy link
Author

faev999 commented Feb 14, 2025

The version I am using for litellm is 1.60.8.
The error is being generated by this line
model_info: dict = litellm.get_model_info(self.model_id)
the problem with get_model_info() from litellm, is that it depends on this file which might not be kept up-to-date.
Also, I found that although some of the models I was testing (e.g groq/llama-3.3-70b-specdec) do exist in that file, I was still getting the same error. I found that if api_base is declared e.g api_base="https://api.groq.com/openai/v1/" then it works.

@aymeric-roucher
Copy link
Collaborator

Possible fixes:

  • Revert litellm to an earlier version
  • Use OpenAIServerModel instead
    Tell me if none works!

@faev999
Copy link
Author

faev999 commented Feb 14, 2025

I just briefly checked the litellm repository, and it seems there are no major bugs regarding the get_model_info() function. The "problem" resides in that some models simply don't exist in their model declaration file that I mentioned previously. One could create a request so they add the desired model to the file, or also one could create a PR in order to add the model.

"Use OpenAIServerModel instead" could you clarify how to do this?

For now I modified that line in 1.9.1 so it looks more like 1.8.1

@cached_property
def _flatten_messages_as_text(self):
        return self.model_id.startswith(("ollama", "groq", "gemini", "openrouter", "cerebras", "sambanova")),

@aymeric-roucher
Copy link
Collaborator

aymeric-roucher commented Feb 14, 2025

Oh I see what you mean! Then yes, reverting litellm version won't work and your fix might work better. Why do you add gemini, openrouter, cerebras & sambanova to the list? When you confirm that, I'll make a final list of providers with whom to use that list and make a patch release.

Re: OpenAIServerModel: just use this as your model calss, and point the api_base to your inference provider's server, like

model = OpenAIServerModel(
    model_id="...",
    api_base="http://...",
    api_key="...",
)

@aymeric-roucher
Copy link
Collaborator

I've just done release 1.9.2 with a fix, please tell me if it still does not work!

@sysradium
Copy link
Contributor

sysradium commented Feb 15, 2025

@aymeric-roucher

Since we switched to startswith detection (which breaks vision over ollama) of flattening maybe it makes sense to proactively check all custom llm providers here and add all of them 🤔
https://github.com/BerriAI/litellm/blob/0ffd99afff449b9e79c5f5316bf14fe011eb98df/litellm/litellm_core_utils/get_llm_provider_logic.py#L388

I thought get_model_info is still called internally inside litellm at some point, so it is hard to avoid it.

With message flattening one way to for it for all models is maybe to use:

flatten_messages_as_text = False                                        
try:                                                                    
    litellm.litellm_core_utils.prompt_templates.factory.prompt_factory( 
        self.model_id,                                                  
        [                                                               
            {"content": {"text": "sample"}},                            
        ],                                                              
        custom_llm_provider=self.model_id.split("/")[0],                
    )                                                                   
except Exception:
   flatten_messages_as_text = True

It kind of seems to work, but looks a bit awkward.

I am just not liking that this way we ditched ollama vision models ...

@faev999
Copy link
Author

faev999 commented Feb 15, 2025

Oh I see what you mean! Then yes, reverting litellm version won't work and your fix might work better. Why do you add gemini, openrouter, cerebras & sambanova to the list? When you confirm that, I'll make a final list of providers with whom to use that list and make a patch release.

Re: OpenAIServerModel: just use this as your model calss, and point the api_base to your inference provider's server, like

model = OpenAIServerModel(
model_id="...",
api_base="http://...",
api_key="...",
)

I have found that if I don't flatten the message when using those providers, then tool usage is no longer reliable, especially with ToolCallingAgent as the manager agent: e.g gemini 2 models simply refuse to use tools or don't even know they exist. However this was just during some small tests I did. I have been using llama 3.3 70b and qwen 2.5 32b with cerebras and groq with very good results using the modifications that I mentioned.

"Re: OpenAIServerModel: just use this as your model calss, and point the api_base to your inference provider's server, like..."

I will try that next, thank you.

@faev999
Copy link
Author

faev999 commented Feb 15, 2025

@aymeric-roucher

Since we switched to startswith detection (which breaks vision over ollama) of flattening maybe it makes sense to proactively check all custom llm providers here and add all of them 🤔 https://github.com/BerriAI/litellm/blob/0ffd99afff449b9e79c5f5316bf14fe011eb98df/litellm/litellm_core_utils/get_llm_provider_logic.py#L388

I thought get_model_info is still called internally inside litellm at some point, so it is hard to avoid it.

With message flattening one way to for it for all models is maybe to use:

flatten_messages_as_text = False
try:
litellm.litellm_core_utils.prompt_templates.factory.prompt_factory(
self.model_id,
[
{"content": {"text": "sample"}},
],
custom_llm_provider=self.model_id.split("/")[0],
)
except Exception:
flatten_messages_as_text = True
It kind of seems to work, but looks a bit awkward.

I am just not liking that this way we ditched ollama vision models ...

In 1.9.1 get_model_info() is using the whole model name just to extract the provider and then confirm if the provider is Ollama, right?, no other fields of the model_info are being used?.

model_info: dict = litellm.get_model_info(self.model_id)
       if model_info["litellm_provider"] == "ollama":

I think that I found 3 different ways to get the provider without depending on the existence of the model name.

import litellm

provider = litellm.get_llm_provider(model="cerebras/llama-3.3-70b")

print(provider)

This prints: ('llama-3.3-70b', 'cerebras', ' value of CEREBRAS_API_KEY', 'https://api.cerebras.ai/v1'). this function will raise an exception if the provider name doesn't exist, e.g model="cerererebras/llama-3.3-70b.

Another easy way to verify is the provider exist in that file is:

from litellm.types.utils import LlmProvidersSet, LlmProviders

is_provider = "ollama" in LlmProvidersSet

Another way, is to use the file itself which is instantiated as variable litellm.model_cost so one could maybe split the provider from the model name i.e. "cerebras" from "cerebras/llama-3.3-70b" and then something like if 'cerebras' in litellm.model_cost. one can also use the function litellm.litellm_core_utils.get_model_cost_map(url) that deserializes the contents of the remote json file into an object and that can be used to compare.

So an approach to keep ollama vision models would be:

@cached_property
   def _flatten_messages_as_text(self):
       import litellm
       provider = litellm.get_llm_provider(model=self.model_id)
       if provider and provider[1] == "ollama":
           model_info: dict = litellm.get_model_info(self.model_id)
           return model_info["key"] != "llava"
       elif provider and provider[1] in["...",....] #selected models for flattening
           return true

       return False

@sysradium
Copy link
Contributor

sysradium commented Feb 15, 2025

@faev999 I was thinking that I can push LiteLLM devs into including more information into get_model_info so that we could make more informed decisions. Like what modes the model supports, etc.

Currently what we want know is to understand if the model supports non-flat messages or not. I wish they could return that as well. From how I read their code, if they have some prompt transformation for the model, then it expects some kind of non-flat message, otherwise flat. Hence my solution above. It looks ugly and maybe relies to much on the internals :/

I am not super happy hardcoding model/provider names, since that's a never-ending game.

@nikolaidk
Copy link

Maybee model validation should not be so strict or one should have an option to skip model validation. Using LiteLLM there is no problem using a model that is not listed - beta models. And one can add necessary configuration in code to use models not listed. For prototyping it would be nice if this litellm implementation worked more like litellm's original implementation. If we could just pass a real litellm model reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants