-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI-API-compatible model not support streaming mode #12143
Comments
Hey @jifei! I'm here to help you with bugs, questions, and more. Let's work together to solve the issue you're facing. I found a similar unsolved discussion regarding the Azure OpenAI API Stream options setting [1]. This might be related to the issue you're experiencing with streaming mode not working in Dify version 0.14.2. For your specific issue, ensure that the If these configurations are correct and the issue persists, it might be beneficial to compare the configurations and code between the older working version and the current version to identify any differences that could be causing the problem. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
hi is this a workflow? |
Conversation Assistant. |
I have identified the cause of the issue, which arose during the deployment of the Qwen2.5 model experiment with vllm version 0.5.1 and ms-swift version 2.4.2.post1, specifically when using the stream mode. The returned content is as follows: Specifically, the data does not include a leading space after "data:". The issue stems from the outdated versions of vllm or ms-swift, but the direct cause of the incompatibility and failure is due to pull request #11272 by dify. My submission #12171 addresses this problem. @yihong0618 @leslie2046 please review it thank you! |
moonshot and stepfun alsow start with "data:" ? |
Maybe not, but the new code is compatible and maintains code consistency. |
I see,you are right |
Self Checks
Dify version
0.14.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
My-LLM is deployed by MS-Swift, which directly supports streaming mode via POST requests to My-LLM. When using Dify for blocking, it is successful; however, streaming fails. Additionally, it is worth noting that an older version of Dify works without issue.
Fail image and log
[on_llm_before_invoke]
Model: my-llm
Parameters:
Stream: True
User: eea58032-acf5-4b50-8b89-8b14c867e4c7
Prompt messages:
role: user
content: hello
[on_llm_new_chunk]2024-12-27 02:41:52,791.791 DEBUG [Thread-225 (_generate_worker)] [connectionpool.py:243] - Starting new HTTP connection (1): 10.150.60.47:8000
2024-12-27 02:41:52,910.910 DEBUG [Thread-225 (_generate_worker)] [connectionpool.py:546] - http://10.150.60.47:8000 "POST /v1/chat/completions HTTP/11" 200 None
2024-12-27 02:41:53,331.331 INFO [Thread-224 (process_request_thread)] [_internal.py:97] - 172.20.0.9 - - [27/Dec/2024 02:41:53] "POST /console/api/apps/31001cf9-6636-467c-8591-069c106c856d/chat-messages HTTP/1.1" 200 -
[on_llm_after_invoke]
Content:
Model: my-llm
Usage: prompt_tokens=1 prompt_unit_price=Decimal('0') prompt_price_unit=Decimal('0') prompt_price=Decimal('0E-7') completion_tokens=0 completion_unit_price=Decimal('0') completion_price_unit=Decimal('0') completion_price=Decimal('0E-7') total_tokens=1 total_price=Decimal('0E-7') currency='USD' latency=0.5560645920049865
System Fingerprint: None
✔️ Expected Behavior
No response
❌ Actual Behavior
no response
The text was updated successfully, but these errors were encountered: