can we implement rate limiting? #39

batrlatom · 2025-01-02T13:58:40Z

Hi, I have a problem using gemini model via litellm. I am getting rate limited very frequently. What about to add some waiting time between calls so it does not happen?

vanetreg · 2025-01-02T15:46:47Z

Hi, I have a problem using gemini model via litellm. I am getting rate limited very frequently. What about to add some waiting time between calls so it does not happen?

Ask Gemini to write a rate limit handler for you :)

batrlatom · 2025-01-02T16:21:26Z

Hi, I have a problem using gemini model via litellm. I am getting rate limited very frequently. What about to add some waiting time between calls so it does not happen?

Ask Gemini to write a rate limit handler for you :)

lol ... it would be rate limited :]

vanetreg · 2025-01-02T16:28:13Z

Ask Gemini to write a rate limit handler for you :)

lol ... it would be rate limited :]
Do it in any chat obviously; havent't you tried?
https://huggingface.co/spaces/osanseviero/gemini-coder

batrlatom · 2025-01-02T16:39:34Z

man, honestly. it's like I will answer any of your question with the same answer.. just generate it

luandro · 2025-01-05T16:05:55Z

This is pretty relevant, and the responses quite annoying. Please re-open it @batrlatom, unless you've found a way.

vanetreg · 2025-01-05T16:11:01Z

This is pretty relevant, and the responses quite annoying. Please re-open it batrlatom, unless you've found a way.

@luandro
Why is that annoying to recommend to ask an LLM to generate for you a handler/wrapper function, which handles rate limits?
We do it all the time in Cursor, Windsurf, Replit, OpenAI Canvas etc.
Just try it, adding the specific rate limits into your prompt.

luandro · 2025-01-05T17:13:05Z

I'm quite familiar with those LLM coding tools, but where to prompt for this change? What works and what doesn't? They won't substitute knowing working examples. By sharing what works we can improve together, and other people who will face this issue in the future will have a quality reference for how to deal with it.

batrlatom · 2025-01-06T09:29:56Z

Thanks @luandro. I switched to qwen-2.5 coder model since it gives me a little better results than Gemini in my case.
But I will try to tackle this issue again. I have found some rate limiting in the LiteLLM itself ( https://docs.litellm.ai/docs/proxy/users#set-rate-limits ) . I will post a PR if it resolve the issue

batrlatom · 2025-01-06T12:38:37Z

btw, I am not experiencing the problem again for now. But if we really want to add limits to prevent bloating gemini too much, the solution could be as simple as:

class RateLimitedLiteLLMModel(LiteLLMModel):
    def __init__(self, model_id="anthropic/claude-3-5-sonnet-20240620", api_base=None, api_key=None, call_delay=0):        
        super().__init__(model_id, api_base, api_key)
        self.call_delay = call_delay
   
    def __call__(
        self,
        messages: List[Dict[str, str]],
        stop_sequences: Optional[List[str]] = None,
        grammar: Optional[str] = None,
        max_tokens: int = 1500,
    ) -> str:
        time.sleep(self.call_delay)
        return super().__call__(messages, stop_sequences, grammar, max_tokens)

aymeric-roucher · 2025-01-09T22:58:51Z

@batrlatom and others, if there's interest: don't hesitate to open a PR with a rate limiter added to the base Model class, indeed it could be useful!

batrlatom closed this as completed Jan 2, 2025

batrlatom reopened this Jan 6, 2025

aymeric-roucher closed this as completed Jan 9, 2025

akjava mentioned this issue Mar 5, 2025

add sleep_per_last_token_model #887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can we implement rate limiting? #39

can we implement rate limiting? #39

batrlatom commented Jan 2, 2025

vanetreg commented Jan 2, 2025

batrlatom commented Jan 2, 2025

vanetreg commented Jan 2, 2025

batrlatom commented Jan 2, 2025

luandro commented Jan 5, 2025

vanetreg commented Jan 5, 2025 •

edited

Loading

luandro commented Jan 5, 2025 •

edited

Loading

batrlatom commented Jan 6, 2025

batrlatom commented Jan 6, 2025

aymeric-roucher commented Jan 9, 2025

can we implement rate limiting? #39

can we implement rate limiting? #39

Comments

batrlatom commented Jan 2, 2025

vanetreg commented Jan 2, 2025

batrlatom commented Jan 2, 2025

vanetreg commented Jan 2, 2025

batrlatom commented Jan 2, 2025

luandro commented Jan 5, 2025

vanetreg commented Jan 5, 2025 • edited Loading

luandro commented Jan 5, 2025 • edited Loading

batrlatom commented Jan 6, 2025

batrlatom commented Jan 6, 2025

aymeric-roucher commented Jan 9, 2025

vanetreg commented Jan 5, 2025 •

edited

Loading

luandro commented Jan 5, 2025 •

edited

Loading