Releases · BerriAI/litellm

08 Feb 19:45

github-actions

v1.23.3

922343f

v1.23.3

What's Changed

[FEAT] 78% Faster s3 Cache⚡️- Proxy/ litellm.acompletion/ litellm.Router.acompletion by @ishaan-jaff in #1891

Full Changelog: v1.23.2...v1.23.3

Contributors

ishaan-jaff

Assets 2

08 Feb 04:41

github-actions

v1.23.2

3c54d8d

v1.23.2

What's Changed 🐬

[FEAT] Azure Pricing - based on base_model in model_info
[Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID
[Feat] Slack Alert when budget tracking fails

1. [FEAT] Azure Pricing - based on base_model in model_info by @ishaan-jaff in #1874

Azure Pricing - Use Base model for cost calculation

Why ?

Azure returns gpt-4 in the response when azure/gpt-4-1106-preview is used, We were using gpt-4 when calculating response_cost

How to use - set `base_model` on config.yaml

model_list:
  - model_name: azure-gpt-3.5
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      base_model: azure/gpt-4-1106-preview

View Cost calculated on Langfuse

This used the correct pricing for azure/gpt-4-1106-preview = (9*0.00001) + (28*0.00003)

2. [Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID by @ishaan-jaff in #1878

If a trace_id is passed we'll place the semantic cache embedding call in the same trace
We now track cost for the API key that will make the embedding call for semantic caching

3. [Feat] Slack Alert when budget tracking fails by @ishaan-jaff in #1877

Full Changelog: v1.23.1...v1.23.2

Contributors

ishaan-jaff

Assets 2

08 Feb 02:37

github-actions

v1.23.1

e17e783

v1.23.1

What's Changed

[Feat] add azure/gpt-4-0125-preview by @ishaan-jaff in #1876

Full Changelog: v1.23.0...v1.23.1

Contributors

ishaan-jaff

Assets 2

07 Feb 09:13

github-actions

v1.23.0

8939593

v1.23.0

What's Changed

feat(ui): enable admin to view all valid keys created on the proxy by @krrishdholakia in #1843
fix(proxy_server.py): prisma client fixes for high traffic by @krrishdholakia in #1860

Full Changelog: v1.22.11...v1.23.0

Contributors

krrishdholakia

Assets 2

07 Feb 04:09

github-actions

v1.22.11

5f4b06f

v1.22.11

Full Changelog: v1.22.10...v1.22.11

Assets 2

07 Feb 02:54

github-actions

v1.22.10

7b26b3b

v1.22.10

What's Changed

fix(proxy_server.py): do a health check on db before returning if proxy ready (if db connected) by @krrishdholakia in #1856
fix(utils.py): return finish reason for last vertex ai chunk by @krrishdholakia in #1847
fix(proxy/utils.py): if langfuse trace id passed in, include in slack alert by @krrishdholakia in #1839
[Feat] Budgets for 'user' param passed to /chat/completions, /embeddings etc by @ishaan-jaff in #1859

Semantic Caching Support - Add Semantic Caching to litellm💰 by @ishaan-jaff in #1829

Use with LiteLLM Proxy https://docs.litellm.ai/docs/proxy/caching
Use with litellm.completion https://docs.litellm.ai/docs/caching/redis_cache

Usage with Proxy

Step 1: Add `cache` to the config.yaml

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
  - model_name: azure-embedding-model
    litellm_params:
      model: azure/azure-embedding-model
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"

litellm_settings:
  set_verbose: True
  cache: True          # set cache responses to True, litellm defaults to using a redis cache
  cache_params:
    type: "redis-semantic"  
    similarity_threshold: 0.8   # similarity threshold for semantic cache
    redis_semantic_cache_embedding_model: azure-embedding-model # set this to a model_name set in model_list

Step 2: Add Redis Credentials to .env

Set either REDIS_URL or the REDIS_HOST in your os environment, to enable caching.

REDIS_URL = ""        # REDIS_URL='redis://username:password@hostname:port/database'
## OR ## 
REDIS_HOST = ""       # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
REDIS_PORT = ""       # REDIS_PORT='18841'
REDIS_PASSWORD = ""   # REDIS_PASSWORD='liteLlmIsAmazing'

Additional kwargs
You can pass in any additional redis.Redis arg, by storing the variable + value in your os environment, like this:

REDIS_<redis-kwarg-name> = ""

Step 3: Run proxy with config

$ litellm --config /path/to/config.yaml

That's IT !

(You'll see semantic-similarity on langfuse if you set langfuse as a success_callback)
(FYI the api key here is deleted 🔑)

Usage with `litellm.completion`

litellm.cache = Cache(
        type="redis-semantic",
        host=os.environ["REDIS_HOST"],
        port=os.environ["REDIS_PORT"],
        password=os.environ["REDIS_PASSWORD"],
        similarity_threshold=0.8,
        redis_semantic_cache_embedding_model="text-embedding-ada-002",
  )
  response1 = completion(
      model="gpt-3.5-turbo",
      messages=[
          {
              "role": "user",
              "content": f"write a one sentence poem about: {random_number}",
          }
      ],
      max_tokens=20,
  )
  print(f"response1: {response1}")

  random_number = random.randint(1, 100000)

  response2 = completion(
      model="gpt-3.5-turbo",
      messages=[
          {
              "role": "user",
              "content": f"write a one sentence poem about: {random_number}",
          }
      ],
      max_tokens=20,
  )
  print(f"response2: {response1}")
  assert response1.id == response2.id

Budgets for 'user' param passed to /chat/completions, /embeddings etc

budget user passed to /chat/completions, without needing to create a key for every user passed
docs: https://docs.litellm.ai/docs/proxy/users

How to Use

Define a litellm.max_user_budget on your confg

litellm_settings:
  max_budget: 10      # global budget for proxy 
  max_user_budget: 0.0001 # budget for 'user' passed to /chat/completions

Make a /chat/completions call, pass 'user' - First call Works

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

Make a /chat/completions call, pass 'user' - Call Fails, since 'ishaan3' over budget

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

Error

{"error":{"message":"Authentication Error, ExceededBudget: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%

Full Changelog: v1.22.9...v1.22.10

Contributors

krrishdholakia and ishaan-jaff

Assets 2

06 Feb 22:21

github-actions

v1.22.9

223cc88

v1.22.9

What's Changed

[FEAT] show langfuse logging / cache tags better through proxy by @ishaan-jaff in #1857
[Feat] Add Semantic Caching to litellm💰 by @ishaan-jaff in #1829

Full Changelog: v1.22.8...v1.22.9

Contributors

ishaan-jaff

Assets 2

06 Feb 21:14

github-actions

v1.22.8

4e6f0e5

v1.22.8

What's Changed

[Fix] UI - Security - Litellm UI Keys meant for litellm-dashboard shouldn't be allowed to make non-management related requests by @ishaan-jaff in #1836
Fix admin UI title and description by @ushuz in #1842
fix(langfuse.py): support logging failed llm api calls to langfuse by @krrishdholakia in #1837
[Feat] Proxy set upperbound params for key/generate by @ishaan-jaff in #1844
build(requirements.txt): update the proxy requirements.txt by @krrishdholakia in #1846

Full Changelog: v1.22.5...v1.22.8

Contributors

ushuz, krrishdholakia, and ishaan-jaff

Assets 2

05 Feb 23:31

github-actions

v1.22.5

77fe71e

v1.22.5

What's Changed

Re-raise exception in async ollama streaming by @vanpelt in #1750
Add a Helm chart for deploying LiteLLM Proxy by @ShaunMaher in #1602
Update Perplexity models in model_prices_and_context_window.json by @toniengelhardt in #1826
(feat) Add sessionId for Langfuse. by @Manouchehri in #1828
[Feat] Sync model_prices_and_context_window.json and litellm/model_prices_and_context_window_backup.json by @ishaan-jaff in #1834

New Contributors

@vanpelt made their first contribution in #1750

Full Changelog: v1.22.3...v1.22.5

Contributors

vanpelt, toniengelhardt, and 3 other contributors

Assets 2

04 Feb 06:46

github-actions

v1.22.3

d1f9165

v1.22.3

What's Changed

feat(utils.py): support cost tracking for openai/azure image gen models by @krrishdholakia in #1805

Full Changelog: v1.22.2...v1.22.3

Contributors

krrishdholakia

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed 🐬

1. [FEAT] Azure Pricing - based on base_model in model_info by @ishaan-jaff in #1874

Why ?

How to use - set `base_model` on config.yaml

View Cost calculated on Langfuse

2. [Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID by @ishaan-jaff in #1878

3. [Feat] Slack Alert when budget tracking fails by @ishaan-jaff in #1877

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Semantic Caching Support - Add Semantic Caching to litellm💰 by @ishaan-jaff in #1829

Usage with Proxy

Step 1: Add `cache` to the config.yaml

Step 2: Add Redis Credentials to .env

Step 3: Run proxy with config

Usage with `litellm.completion`

Budgets for 'user' param passed to /chat/completions, /embeddings etc

How to Use

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Releases: BerriAI/litellm

v1.23.3

What's Changed

Contributors

v1.23.2

What's Changed 🐬

1. [FEAT] Azure Pricing - based on base_model in model_info by @ishaan-jaff in #1874

Why ?

How to use - set base_model on config.yaml

View Cost calculated on Langfuse

2. [Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID by @ishaan-jaff in #1878

3. [Feat] Slack Alert when budget tracking fails by @ishaan-jaff in #1877

Contributors

v1.23.1

What's Changed

Contributors

v1.23.0

What's Changed

Contributors

v1.22.11

v1.22.10

What's Changed

Semantic Caching Support - Add Semantic Caching to litellm💰 by @ishaan-jaff in #1829

Usage with Proxy

Step 1: Add cache to the config.yaml

Step 2: Add Redis Credentials to .env

Step 3: Run proxy with config

Usage with litellm.completion

Budgets for 'user' param passed to /chat/completions, /embeddings etc

How to Use

Contributors

v1.22.9

What's Changed

Contributors

v1.22.8

What's Changed

Contributors

v1.22.5

What's Changed

New Contributors

Contributors

v1.22.3

What's Changed

Contributors

How to use - set `base_model` on config.yaml

Step 1: Add `cache` to the config.yaml

Usage with `litellm.completion`