[Ray Serve] need support for streaming outputs #34266
Labels
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
serve
Ray Serve Related Issue
Description
I am building a chatbot with ray serves my model, currently my model supports streaming outputs like generating word tokens one after another. However, I could not get ray to send token back immediately in streaming manner. This would be a feature greatly required!
Something like above that allows StreamingResponse from FastAPI, currently even I tried using Streaming Response, it still responses the whole paragraph once all tokens have been generated.
Use case
Especially in nlp model deployment, need this feature to send token back directly rather than wait for the whole paragraph being generated.
The text was updated successfully, but these errors were encountered: