You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
After running the llama3 model and using curl for simultaneous conversations, I received responses that were queued instead of running simultaneously. I hope to support the processing of multiple sessions like multithreading
Motivation
Implement local multi session chat
Possible Implementation
No response
The text was updated successfully, but these errors were encountered:
Looking at #5794 it appears that this feature may already be added.
When running the server, did you set the --threads-http parameter? Setting this to >1 looks like it may enable you to process multiple requests simultaneously? (disclaimer: I have not yet attempted this myself, so please let me know if I misunderstood your report, your request, or the current server capabilities)
Try setting --cont-batching and --parallel 2. (edit: looks like --cont-batching is already enabled by default, so you should just need to set --parallel?)
Prerequisites
Feature Description
After running the llama3 model and using curl for simultaneous conversations, I received responses that were queued instead of running simultaneously. I hope to support the processing of multiple sessions like multithreading
Motivation
Implement local multi session chat
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: