-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Clarification Needed for Using trim_messages with Node.js/TypeScript #8470
Comments
cc @krrishdholakia , I believe you got a request around this recently. Might be relevant |
Hey @suysoftware this isn't currently available on proxy - how would you expect this to work? |
Thanks for the quick response. For our use case, it would be really helpful if we could enable token trimming via a parameter in the request body. For example, if there were an option like "trim_messages": true (or similar) in the request payload, it would streamline our integration without having to implement the trimming on our end. Thanks again for considering this functionality! |
sounds good |
that would be super useful! |
Note: I am working on a typescript project, so our only hope is litellm proxy as the unoffical litellm js library is not maintained well. so, my views below are only for the proxy. There are some common problems when integrating other llm models:
It would be awesome if these could be provided in the proxy config.yaml file. |
Hey @PaperBoardOfficial what would you expect to happen here? (trying to see if we cover this already) e.g. https://docs.litellm.ai/docs/routing#pre-call-checks-context-window-eu-regions |
No, I was not referring to fallback routing based on context window length check. I was just saying that lets say I am using a 4k model, and I pass a message greater than 4k, either I should be able to break down the message into chunks and then send it to the llm or maybe summarize the chunks and then send it to the llm [This is how CrewAI solves this issue: They have a hashmap for context window length for different models (link). So, if the message length is greater than the context window they call the llm to summarize the text in chunks (link)]. |
The Feature
We are currently integrating LiteLLM into our Node.js backend using TypeScript, and we are issuing curl requests directly. However, the documentation for the trim_messages (trimming input messages) feature only demonstrates its usage with the Python SDK. This leaves us unsure about how to leverage this feature when using direct HTTP requests from a Node.js environment.
Could you please provide guidance or documentation on how to use trim_messages (or its equivalent) in our setup?
Motivation, pitch
Any examples or instructions for handling token trimming in a Node.js/TypeScript context would be greatly appreciated.
Are you a ML Ops Team?
No
Twitter / LinkedIn details
https://x.com/sezerufukyavuz
The text was updated successfully, but these errors were encountered: