Claude/openai/google prompt caching #2987

mrdrprofuroboros · 2024-08-21T04:18:55Z

mrdrprofuroboros
Aug 21, 2024

Describe the feature or potential improvement

Anthropic has recently added prompt caching with its own special pricing
https://www.anthropic.com/news/prompt-caching

Apparently this feature will start popping up in other providers. Unfortunately I didn't find any discussion in langchain either so might worth starting it to get their point of view on the API standard, but right now we have to dirty hack anthropic generations with this sort of workarounds to calculate the price correctly:

anthropic_client = anthropic.AsyncAnthropic()

@observe(as_type="generation")
async def anthropic_completion(model: str, max_tokens: int, system: List[Dict[str, Any]], messages: List[Dict[str, Any]]):
    resp = await anthropic_client.beta.prompt_caching.messages.create(
        model=model,
        max_tokens=max_tokens,
        system=system,
        messages=messages,
    )

    # convert anthropic cache usage to "real" input tokens equivalents
    # https://www.anthropic.com/pricing#anthropic-api
    cache_writes = resp.usage.cache_creation_input_tokens * 3.75 / 3
    cache_reads = resp.usage.cache_read_input_tokens * 0.3 / 3

    langfuse_context.update_current_observation(
        model=model,
        usage={
            "input": resp.usage.input_tokens + cache_writes + cache_reads,
            "output": resp.usage.output_tokens,
        },
    )
    return resp

Adding cache writes / reads to model pricing in langfuse would be much cleaner

Additional information

No response

marcklingen · 2024-08-21T12:52:19Z

marcklingen
Aug 21, 2024
Maintainer

Thanks for sharing! We are planning to add more billing logic here as there are some other complexities as well with e.g. google models having different prices depending on the token counts. You can however ingest USD costs if you want to maintain accurate token counts and usd prices in langfuse in the meantime.

Example (docs):

langfuse_context.update_current_observation(
      usage={
          "input": response.usage.input_tokens,
          "output": response.usage.output_tokens,
          # Optionally, also ingest usd cost. Alternatively, you can infer it via a model definition in Langfuse.
          # Here we assume the input and output cost are 1 USD each.
          "input_cost": 1,
          "output_cost": 1,
          # "total_cost": float, # if not set, it is derived from input_cost + output_cost
      }
  )

0 replies

mrdrprofuroboros · 2024-08-22T03:33:17Z

mrdrprofuroboros
Aug 22, 2024
Author

I made a PR to langchain to support it
langchain-ai/langchain#25644

2 replies

marcklingen Aug 22, 2024
Maintainer

awesome, thanks!

mrdrprofuroboros Aug 22, 2024
Author

but now I'm not able to update the cost in langfuse, cause I can't get into the langchain generation context lol
I made this langfuse/langfuse-python#878
I doubt it's worth merging, but pls take this case into account while working on your billing logic updates

antoniomdk · 2024-10-07T11:26:49Z

antoniomdk
Oct 7, 2024

OpenAI has recently released prompt caching as well. It'd be great if the Langfuse wrappers could read the cached_tokens from the response and submit the accurate cost.

0 replies

marcklingen · 2024-10-07T18:31:38Z

marcklingen
Oct 7, 2024
Maintainer

This is high up on our roadmap to add, thanks for your +1 @antoniomdk

1 reply

marcklingen Oct 7, 2024
Maintainer

cc @hassiebp

ngamolsky · 2024-12-17T19:05:21Z

ngamolsky
Dec 17, 2024

Any progress on this? Would be great to see it

1 reply

marcklingen Dec 18, 2024
Maintainer

@hassiebp has a working PR and will release this soon

hassiebp · 2024-12-20T18:14:38Z

hassiebp
Dec 20, 2024
Maintainer

🎄 We have just released support for arbitrary LLM usage types incl. cached / multimodal / reasoning tokens. Feel free to try it out by upgrading your SDKs and adding a custom model definition if necessary, and let us know your feedback!

Changelog

11 replies

ajram23 Jan 8, 2025

@hassiebp I see the last used model timestamp. FYI I dont think it's sorted or has the ability to.

hassiebp Jan 8, 2025
Maintainer

Great! Yes that is expected, we do not yet have shipped custom sorting on the models table. You should see your used models though at the top in descending order by last used 👍🏾

ajram23 Jan 9, 2025

@hassiebp got it. Thats not what I am seeing, for instance I see a whole lot of models I have never used and GPT4-o at the top of the next page. I think you might be right per page but not for the whole list of models.

hassiebp Jan 9, 2025
Maintainer

Thanks - that's a bug that's fixed here: #4947

ajram23 Jan 9, 2025

@hassiebp works! 🎉

mrdrprofuroboros · 2024-12-20T20:02:17Z

mrdrprofuroboros
Dec 20, 2024
Author

@hassiebp
I look at the docs https://langfuse.com/docs/model-usage-and-cost
and at the implementation in the repo (though a bit lost there as just skimming fast)

but apparently you're making a bet on OpenAI format, which counts cache reads (great!) but leaves cache writes implicit. Though Anthropic sets a higher price for cache writes (and gives you the control over it)
but your openai-centric model doesn't support anthropic's cache write cost increases, right?

Maybe adding cache_writes field to details and just zeroing it by default would do the trick? It's fine that users would have to provide this manually from claude's responses, it's just the fact that there's no place to put it rn and we still have to do manual shenanigans with input tokens count or cost

7 replies

mrdrprofuroboros Dec 20, 2024
Author

ah, I see, that's perfect!
might need some more clarity regarding this in the docs, 'cause I only see this feature under the microscope on the UI gif :) (adding arbitrary field)
it is not exposed via API, right? https://api.reference.langfuse.com/#get-/api/public/models

marcklingen Dec 20, 2024
Maintainer

Thanks! Would really appreciate a contribution to the docs if you have a good idea on how to explain this

it is not exposed via API, right? https://api.reference.langfuse.com/#get-/api/public/models

Not yet, this will be a follow-up PR, probably needs a v2 api as this is somewhat breaking
Tracking this here: #4811

mrdrprofuroboros Dec 20, 2024
Author

oof, have to migrate to 3.0 to test it first :D

marcklingen Dec 20, 2024
Maintainer

Good reason to upgrade, we tried to make this as seamless as possible, let us know if you have any questions

hassiebp Dec 22, 2024
Maintainer

@mrdrprofuroboros Thanks for your quick feedback :)

The new cost model allows setting prices for arbitrary usage types for a given model. We have not taken an OpenAI specific approach here: For Anthropic Sonnet 3.5, you may provide your usage details exactly as returned from the Anthropic API, and cost should be accurately computed even when having cache_read_input_token or cache_creation_input_tokens. See here.

If you miss a usage type and price combination on a given model, feel free to either open a PR adding the price in default-model-prices.json for it to be available to all users or clone the model in the UI and add another key-value pair.

The only OpenAI-specific utility we provide is to parse the OpenAI Usage schema and map it to Langfuse usage types as described in our docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Claude/openai/google prompt caching #2987

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 22 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Langfuse

Claude/openai/google prompt caching #2987

mrdrprofuroboros Aug 21, 2024

Describe the feature or potential improvement

Additional information

Replies: 7 comments · 22 replies

marcklingen Aug 21, 2024 Maintainer

mrdrprofuroboros Aug 22, 2024 Author

marcklingen Aug 22, 2024 Maintainer

mrdrprofuroboros Aug 22, 2024 Author

antoniomdk Oct 7, 2024

marcklingen Oct 7, 2024 Maintainer

marcklingen Oct 7, 2024 Maintainer

ngamolsky Dec 17, 2024

marcklingen Dec 18, 2024 Maintainer

hassiebp Dec 20, 2024 Maintainer

ajram23 Jan 8, 2025

hassiebp Jan 8, 2025 Maintainer

ajram23 Jan 9, 2025

hassiebp Jan 9, 2025 Maintainer

ajram23 Jan 9, 2025

mrdrprofuroboros Dec 20, 2024 Author

mrdrprofuroboros Dec 20, 2024 Author

marcklingen Dec 20, 2024 Maintainer

mrdrprofuroboros Dec 20, 2024 Author

marcklingen Dec 20, 2024 Maintainer

hassiebp Dec 22, 2024 Maintainer

mrdrprofuroboros
Aug 21, 2024

Replies: 7 comments 22 replies

marcklingen
Aug 21, 2024
Maintainer

mrdrprofuroboros
Aug 22, 2024
Author

marcklingen Aug 22, 2024
Maintainer

mrdrprofuroboros Aug 22, 2024
Author

antoniomdk
Oct 7, 2024

marcklingen
Oct 7, 2024
Maintainer

marcklingen Oct 7, 2024
Maintainer

ngamolsky
Dec 17, 2024

marcklingen Dec 18, 2024
Maintainer

hassiebp
Dec 20, 2024
Maintainer

hassiebp Jan 8, 2025
Maintainer

hassiebp Jan 9, 2025
Maintainer

mrdrprofuroboros
Dec 20, 2024
Author

mrdrprofuroboros Dec 20, 2024
Author

marcklingen Dec 20, 2024
Maintainer

mrdrprofuroboros Dec 20, 2024
Author

marcklingen Dec 20, 2024
Maintainer

hassiebp Dec 22, 2024
Maintainer