Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray Serve] need support for streaming outputs #34266

Closed
jiaanguo opened this issue Apr 11, 2023 · 3 comments
Closed

[Ray Serve] need support for streaming outputs #34266

jiaanguo opened this issue Apr 11, 2023 · 3 comments
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue

Comments

@jiaanguo
Copy link

jiaanguo commented Apr 11, 2023

Description

I am building a chatbot with ray serves my model, currently my model supports streaming outputs like generating word tokens one after another. However, I could not get ray to send token back immediately in streaming manner. This would be a feature greatly required!

@serve.deployment(route_prefix="/", ray_actor_options={"num_gpus": 1})
@serve.ingress(app)
class ChatbotModelDeployment:

    def _infer(...):
        for output in self._model.stream_chat(...):
            yield output+'\n'

    @app.post("/")
    async def query(self, request: Request):
        data = await request.json()
        query = data.get("query", "")

        output = self._infer(query, ...)
        return StreamingResponse(output, media_type="text/plain")

  
chatbot_model_deployment = ChatbotModelDeployment.bind()

Something like above that allows StreamingResponse from FastAPI, currently even I tried using Streaming Response, it still responses the whole paragraph once all tokens have been generated.

Use case

Especially in nlp model deployment, need this feature to send token back directly rather than wait for the whole paragraph being generated.

@jiaanguo jiaanguo added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 11, 2023
@richardliaw richardliaw added the serve Ray Serve Related Issue label Apr 11, 2023
@akshay-anyscale akshay-anyscale added the P1 Issue that should be fixed within a few weeks label Apr 14, 2023
@hora-anyscale hora-anyscale removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Apr 14, 2023
@merionum
Copy link

Hello guys! Any estimates on when this can be included in ray serve? This feature would be crucial for LLM work happening globally at the moment

@edoakes
Copy link
Contributor

edoakes commented May 19, 2023

@merionum we are currently working on adding this and hoping to have an experimental version ready for use in Ray 2.5

@edoakes
Copy link
Contributor

edoakes commented Jun 8, 2023

This is now available as an experimental feature for 2.5. Planning to have it on by default in 2.6:
https://docs.ray.io/en/latest/serve/http-guide.html#streaming-responses

@edoakes edoakes closed this as completed Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

6 participants