Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can we implement rate limiting? #39

Closed
batrlatom opened this issue Jan 2, 2025 · 10 comments
Closed

can we implement rate limiting? #39

batrlatom opened this issue Jan 2, 2025 · 10 comments

Comments

@batrlatom
Copy link

Hi, I have a problem using gemini model via litellm. I am getting rate limited very frequently. What about to add some waiting time between calls so it does not happen?

@vanetreg
Copy link

vanetreg commented Jan 2, 2025

Hi, I have a problem using gemini model via litellm. I am getting rate limited very frequently. What about to add some waiting time between calls so it does not happen?

Ask Gemini to write a rate limit handler for you :)

@batrlatom
Copy link
Author

Hi, I have a problem using gemini model via litellm. I am getting rate limited very frequently. What about to add some waiting time between calls so it does not happen?

Ask Gemini to write a rate limit handler for you :)

lol ... it would be rate limited :]

@vanetreg
Copy link

vanetreg commented Jan 2, 2025

Ask Gemini to write a rate limit handler for you :)

lol ... it would be rate limited :]
Do it in any chat obviously; havent't you tried?
https://huggingface.co/spaces/osanseviero/gemini-coder

@batrlatom
Copy link
Author

man, honestly. it's like I will answer any of your question with the same answer.. just generate it

@luandro
Copy link

luandro commented Jan 5, 2025

This is pretty relevant, and the responses quite annoying. Please re-open it @batrlatom, unless you've found a way.

@vanetreg
Copy link

vanetreg commented Jan 5, 2025

This is pretty relevant, and the responses quite annoying. Please re-open it batrlatom, unless you've found a way.

@luandro
Why is that annoying to recommend to ask an LLM to generate for you a handler/wrapper function, which handles rate limits?
We do it all the time in Cursor, Windsurf, Replit, OpenAI Canvas etc.
Just try it, adding the specific rate limits into your prompt.

@luandro
Copy link

luandro commented Jan 5, 2025

I'm quite familiar with those LLM coding tools, but where to prompt for this change? What works and what doesn't? They won't substitute knowing working examples. By sharing what works we can improve together, and other people who will face this issue in the future will have a quality reference for how to deal with it.

@batrlatom
Copy link
Author

Thanks @luandro. I switched to qwen-2.5 coder model since it gives me a little better results than Gemini in my case.
But I will try to tackle this issue again. I have found some rate limiting in the LiteLLM itself ( https://docs.litellm.ai/docs/proxy/users#set-rate-limits ) . I will post a PR if it resolve the issue

@batrlatom batrlatom reopened this Jan 6, 2025
@batrlatom
Copy link
Author

btw, I am not experiencing the problem again for now. But if we really want to add limits to prevent bloating gemini too much, the solution could be as simple as:

class RateLimitedLiteLLMModel(LiteLLMModel):
    def __init__(self, model_id="anthropic/claude-3-5-sonnet-20240620", api_base=None, api_key=None, call_delay=0):        
        super().__init__(model_id, api_base, api_key)
        self.call_delay = call_delay
   
    def __call__(
        self,
        messages: List[Dict[str, str]],
        stop_sequences: Optional[List[str]] = None,
        grammar: Optional[str] = None,
        max_tokens: int = 1500,
    ) -> str:
        time.sleep(self.call_delay)
        return super().__call__(messages, stop_sequences, grammar, max_tokens)

@aymeric-roucher
Copy link
Collaborator

@batrlatom and others, if there's interest: don't hesitate to open a PR with a rate limiter added to the base Model class, indeed it could be useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants