[Ray Serve] need support for streaming outputs #34266

jiaanguo · 2023-04-11T09:35:37Z

Description

I am building a chatbot with ray serves my model, currently my model supports streaming outputs like generating word tokens one after another. However, I could not get ray to send token back immediately in streaming manner. This would be a feature greatly required!

@serve.deployment(route_prefix="/", ray_actor_options={"num_gpus": 1})
@serve.ingress(app)
class ChatbotModelDeployment:

    def _infer(...):
        for output in self._model.stream_chat(...):
            yield output+'\n'

    @app.post("/")
    async def query(self, request: Request):
        data = await request.json()
        query = data.get("query", "")

        output = self._infer(query, ...)
        return StreamingResponse(output, media_type="text/plain")

  
chatbot_model_deployment = ChatbotModelDeployment.bind()

Something like above that allows StreamingResponse from FastAPI, currently even I tried using Streaming Response, it still responses the whole paragraph once all tokens have been generated.

Use case

Especially in nlp model deployment, need this feature to send token back directly rather than wait for the whole paragraph being generated.

The text was updated successfully, but these errors were encountered:

merionum · 2023-05-19T12:15:20Z

Hello guys! Any estimates on when this can be included in ray serve? This feature would be crucial for LLM work happening globally at the moment

edoakes · 2023-05-19T16:25:30Z

@merionum we are currently working on adding this and hoping to have an experimental version ready for use in Ray 2.5

edoakes · 2023-06-08T21:11:39Z

This is now available as an experimental feature for 2.5. Planning to have it on by default in 2.6:
https://docs.ray.io/en/latest/serve/http-guide.html#streaming-responses

jiaanguo added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 11, 2023

richardliaw added the serve Ray Serve Related Issue label Apr 11, 2023

akshay-anyscale added the P1 Issue that should be fixed within a few weeks label Apr 14, 2023

hora-anyscale removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Apr 14, 2023

edoakes mentioned this issue May 24, 2023

[serve] Add experimental support for StreamingResponse using RayObjectRefGenerator #35720

Merged

8 tasks

edoakes closed this as completed Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ray Serve] need support for streaming outputs #34266

[Ray Serve] need support for streaming outputs #34266

jiaanguo commented Apr 11, 2023 •

edited

Loading

merionum commented May 19, 2023

edoakes commented May 19, 2023 •

edited

Loading

edoakes commented Jun 8, 2023

[Ray Serve] need support for streaming outputs #34266

[Ray Serve] need support for streaming outputs #34266

Comments

jiaanguo commented Apr 11, 2023 • edited Loading

Description

Use case

merionum commented May 19, 2023

edoakes commented May 19, 2023 • edited Loading

edoakes commented Jun 8, 2023

jiaanguo commented Apr 11, 2023 •

edited

Loading

edoakes commented May 19, 2023 •

edited

Loading