Claude/openai/google prompt caching #2987
Replies: 7 comments 22 replies
-
Thanks for sharing! We are planning to add more billing logic here as there are some other complexities as well with e.g. google models having different prices depending on the token counts. You can however ingest USD costs if you want to maintain accurate token counts and usd prices in langfuse in the meantime. Example (docs): langfuse_context.update_current_observation(
usage={
"input": response.usage.input_tokens,
"output": response.usage.output_tokens,
# Optionally, also ingest usd cost. Alternatively, you can infer it via a model definition in Langfuse.
# Here we assume the input and output cost are 1 USD each.
"input_cost": 1,
"output_cost": 1,
# "total_cost": float, # if not set, it is derived from input_cost + output_cost
}
) |
Beta Was this translation helpful? Give feedback.
-
I made a PR to langchain to support it |
Beta Was this translation helpful? Give feedback.
-
OpenAI has recently released prompt caching as well. It'd be great if the Langfuse wrappers could read the cached_tokens from the response and submit the accurate cost. |
Beta Was this translation helpful? Give feedback.
-
This is high up on our roadmap to add, thanks for your +1 @antoniomdk |
Beta Was this translation helpful? Give feedback.
-
Any progress on this? Would be great to see it |
Beta Was this translation helpful? Give feedback.
-
🎄 We have just released support for arbitrary LLM usage types incl. cached / multimodal / reasoning tokens. Feel free to try it out by upgrading your SDKs and adding a custom model definition if necessary, and let us know your feedback! |
Beta Was this translation helpful? Give feedback.
-
@hassiebp but apparently you're making a bet on OpenAI format, which counts cache reads (great!) but leaves cache writes implicit. Though Anthropic sets a higher price for cache writes (and gives you the control over it) Maybe adding cache_writes field to details and just zeroing it by default would do the trick? It's fine that users would have to provide this manually from claude's responses, it's just the fact that there's no place to put it rn and we still have to do manual shenanigans with input tokens count or cost |
Beta Was this translation helpful? Give feedback.
-
Describe the feature or potential improvement
Anthropic has recently added prompt caching with its own special pricing
https://www.anthropic.com/news/prompt-caching
Apparently this feature will start popping up in other providers. Unfortunately I didn't find any discussion in langchain either so might worth starting it to get their point of view on the API standard, but right now we have to dirty hack anthropic generations with this sort of workarounds to calculate the price correctly:
Adding cache writes / reads to model pricing in langfuse would be much cleaner
Additional information
No response
Beta Was this translation helpful? Give feedback.
All reactions